Back to Blog

Rust Word Frequency Counter with HashMap 📊 | Rust by Examples #10

Sandy LaneSandy Lane
•

Video: Rust Word Frequency Counter with HashMap 📊 | Rust by Examples #10 by Taught by Celeste AI - AI Coding Coach

Watch full page →

Rust Word Frequency Counter with HashMap

Learn how to build a word frequency counter in Rust using the HashMap collection and the Entry API for efficient counting. This example reads text, counts occurrences of each word, and displays the top N most frequent words sorted by their counts.

Code

use std::collections::HashMap;
use std::fs;

// Counts the frequency of each word in the given text slice
fn count_words(text: &str) -> HashMap<String, usize> {
  let mut counts = HashMap::new();

  for word in text.split_whitespace() {
    // Normalize word to lowercase for consistent counting
    let word = word.to_lowercase();
    // Use Entry API to insert or update the count efficiently
    *counts.entry(word).or_insert(0) += 1;
  }

  counts
}

// Returns the top n words sorted by frequency in descending order
fn top_words(counts: &HashMap<String, usize>, n: usize) -> Vec<(String, usize)> {
  let mut word_counts: Vec<(String, usize)> = counts.iter()
    .map(|(word, &count)| (word.clone(), count))
    .collect();

  // Sort by count descending
  word_counts.sort_by(|a, b| b.1.cmp(&a.1));

  // Take top n results
  word_counts.into_iter().take(n).collect()
}

fn main() {
  // Read sample text file
  let text = fs::read_to_string("sample.txt").expect("Failed to read sample.txt");

  // Count words
  let counts = count_words(&text);

  // Get top 5 words
  let top = top_words(&counts, 5);

  println!("Top 5 words by frequency:");
  for (word, count) in top {
    println!("{}: {}", word, count);
  }
}

Key Points

  • HashMap is ideal for counting occurrences with keys as words and values as counts.
  • The Entry API with entry().or_insert() simplifies updating counts without extra lookups.
  • Text processing uses split_whitespace() and lowercase normalization for consistent counting.
  • Sorting a vector of (word, count) tuples by count descending helps find the most frequent words.
  • Reading from a file and modularizing code makes the counter reusable and clean.