Word Frequency Counter

The best way to learn any programming language is to develop small meaningful programs that are easy to write and serve a purpose. In this three part article, we'll develop several small command line programs. Each article will focus on a different part of the standard library.

Word Frequency Counter


In this article, we'll develop a simple program that counts how many times a word occurs in a plain text file. The program will accept a command line argument as the filename and output each word with a total count.

To get command line arguments, we need to use the std::env::args() function. This function returns an iterator that we can use to iterate over each argument. We can call the collect() method on the iterator to convert the arguments into another collection. Listing 1 below converts the iterator into a Vector of type String.

Listing 1

use std::env;

fn main(){

    let args : Vec<String> = env::args().collect();
}

The args variable now contains a list of command line arguments. One thing to remember is that the first item in the vector is the name of the program. This means, if we supplied a single argument, the length of args will be 2. As our program only requires one argument, we can use the len() method of the Vector to add some basic validation.

Listing 2

...
    let args : Vec<String> = env::args().collect();

    if args.len() == 2{
        // To do
    }else{
        println!("Command requires file name argument");
    }

We can now get references to the items in the args variable either by using the get() method of the Vector or using an index. Once we have a reference to the argument, we can use the fs module to open the file for reading. The easiest way to read a file is to use the read_to_string() function, which returns a Resulttype.

Listing 3

        let file_name = &args[1];

        let result = fs::read_to_string(file_name);

As read_to_string() returns a Result type, we can use pattern matching to handle the Ok and Err variants. If the result is successful, the Ok variant will supply us with the file contents. Once we have the data, we can split it using the space character to get a collection of individual words.

The easiest way to count the occurrence of a word is to store each word in a HashMap. We can do this by iterating over the words and adding each word to the HashMap with an initial value of 1. On each iteration, we can check if the current word exists in the HashMap, if it does, we can update the counter. Listing 4 below shows the complete code.

Listing 4

use std::env;
use std::fs;
use std::collections::HashMap;

fn main(){

    let args : Vec<String> = env::args().collect();

    if args.len() == 2{

        let file_name = &args[1];

        let result = fs::read_to_string(file_name);

        match result {
            Ok(s) => {

                let mut word_map : HashMap<&str, i32> = HashMap::new();
                
                let words = s.split(" ");

                for word in words{
                    
                    if word_map.contains_key(word){
                        let count = word_map.get(word).unwrap_or(&0);
                        let total = *count + 1;
                        word_map.insert(word, total);
                    }else{
                        word_map.insert(word, 1);
                    }
                }

                for (k, v) in word_map{
                    println!("{} = {}", k, v);
                }
            },
            Err(e) => {
                println!("{}", e);
            }
        }
    }else{
        println!("Command requires file name argument");
    }
}

Few things to note about the code above. map.get(..) returns an Option<&i32>, we could have used match to handle Some and None but instead, we've used the unwrap_or() method to return a default value. Since the value contained in Some is a reference to an i32, unwrap_or() must also return a reference to an i32.

The count variable is a reference to a value, this means we need to dereference it before using it in the addition. Once the HashMap has been constructed, we can iterate over the items and print each word and its count.

HANA DB

Rust

Java

SAP Business One

Node.js