Back to Blog

Rust Regex Tutorial | Extract Emails & URLs with Pattern Matching | Rust by Examples #12

Sandy LaneSandy Lane

Video: Rust Regex Tutorial | Extract Emails & URLs with Pattern Matching | Rust by Examples #12 by Taught by Celeste AI - AI Coding Coach

Watch full page →

Rust Regex Tutorial: Extract Emails & URLs with Pattern Matching

In this tutorial, we explore how to use Rust’s powerful regex crate to find and extract email addresses and URLs from text. We cover creating regex patterns with capture groups, iterating over all matches, and performing search-and-replace operations to redact sensitive data.

Code

use regex::Regex;
use std::fs;

// Read a sample text file and extract emails and URLs using regex patterns
fn main() {
  // Load the text from a file
  let text = fs::read_to_string("sample.txt").expect("Failed to read file");

  // Regex pattern for emails with two capture groups: username and domain
  let email_re = Regex::new(r"(\w+)@([\w\.]+)").unwrap();

  // Regex pattern for URLs starting with http or https
  let url_re = Regex::new(r"https?://[^\s]+").unwrap();

  // Check if any email exists in the text
  if email_re.is_match(&text) {
    println!("Emails found:");

    // Iterate over all email matches with capture groups
    for (i, cap) in email_re.captures_iter(&text).enumerate() {
      println!("{}: user='{}', domain='{}'", i + 1, &cap[1], &cap[2]);
    }
  }

  // Find and print all URLs
  println!("\nURLs found:");
  for (i, url_match) in url_re.find_iter(&text).enumerate() {
    println!("{}: {}", i + 1, url_match.as_str());
  }

  // Redact all emails in the text by replacing them with [redacted]
  let redacted_text = email_re.replace_all(&text, "[redacted]");
  println!("\nRedacted text:\n{}", redacted_text);
}

Key Points

  • The regex crate provides easy-to-use methods like is_match, find_iter, and captures_iter for pattern matching.
  • Capture groups, defined with parentheses, allow extraction of specific parts of a match, such as usernames and domains in emails.
  • Raw string literals (r"...") simplify writing regex patterns by avoiding escape clutter.
  • replace_all enables powerful search-and-replace operations, useful for redacting sensitive information.
  • Combining regex with file I/O lets you build practical text processing tools in Rust efficiently.