Count File Types In Directory

In this last article on developing a command line program, we'll develop a program that counts the occurrence of different file types in a directory and its sub-directories. The program will accept a directory as an argument and recursively scan all sub-directories. We'll use a HashMap to keep track of the count for each file type.

Let's begin by setting up the main function to collect command line arguments and a HashMap to store the running count for each file type.

Listing 1

use std::collections::HashMap;

fn main() {

    let args : Vec<String> = env::args().collect();

    let mut map : HashMap<String, usize> = HashMap::new();
}

Rust provides a few handy functions to deal with file system-related tasks, one of those functions is the read_dir() function which returns an iterator over the entries in a directory. The read_dir() function only provides an iterator for the specified directory, so we'll need to write a function that can recursively iterate over all sub-directories.

To do this, we'll need to know if an entry returned by the iterator is a directory or a file. If it's a directory, we can call our function again with the new directory path.

Listing 2

fn read_directory(path : &str){

    let dir = fs::read_dir(path).unwrap();

    for result in dir{

        let entry = &result.unwrap();
        let metadata = entry.metadata();
        
        if metadata.unwrap().is_dir(){
        
            read_directory(entry.path().to_str().unwrap());
        }else{
            // Todo
        }
    }
}

The read_directory() function in listing 2 above accepts a directory path as an &str, which is then used to read the directory using the read_dir() function from the fs module. The read_dir() function returns an iterator, which we can use to iterator over the entries in the directory. The iterator will yield instances of io::Result<DirEntry>, this is because, during the iteration, new errors can be encountered. After unwrapping the result from the iterator, we get a DirEntry which has a metadata() method.

The metadata for an entry will tell us if the entry is a file or a directory. If it's a directory, we call the read_directory() method with the path to the directory.

Note that many of the functions including read_dir() return a Result which should be correctly handled but we'll ignore the errors for this program to keep the code simple.

Now that we have a function that can recursively iterate over all folders in a directory, all we have to do is pass the HashMap initialized in the main function to the read_directory() function where we can keep track of the count for each file type using the files extension. Listing 3 below shows the complete code.

Listing 3

use std::env;
use std::fs;
use std::collections::HashMap;
use std::thread;

fn main() {

    let args : Vec<String> = env::args().collect();

    let mut map : HashMap<String, usize> = HashMap::new();

    read_directory(&args[1], &mut map);

    for (ext, count) in map{
        println!("{ext}\t\t{count}");
    }
}

fn read_directory(path : &str, map : &mut HashMap<String, usize>){

    let dir = fs::read_dir(path).unwrap();

    for result in dir{

        let entry = &result.unwrap();
        let metadata = entry.metadata();

        if metadata.unwrap().is_dir(){
            let dir_name = entry.file_name().to_str().unwrap().to_string();

            if dir_name.chars().nth(0).unwrap() != '.' {
                read_dir(entry.path().to_str().unwrap(), map);
            }
        }else{
            let file_name = entry.file_name().to_str().unwrap().to_string();

            if file_name.contains("."){
                let (_file, ext) = file_name.split_once(".").unwrap();

                if map.contains_key(ext){
                    let count = map.get(ext).unwrap_or(&0);
                    let total = *count + 1;
                    map.insert(ext.to_string(), total);
                }else{
                    map.insert(ext.to_string(), 1);
                }
            }
        }
    }
}

SAP Business One

HANA DB

Java

Rust

Node.js