Recursively calculate checksums for all files in a directory tree

Recently, I had reason to doubt the integrity of the weekly backup that I make of my home directory. To check the integrity of the backup I thought it would be a simple matter of running

md5sum * > homeChecksums.md5

on my home directory, followed by

md5sum -c homeChecksums.md5

on the backup directory, but no such luck. md5sum doesn’t automatically operate recursively on a directory and nor is there a -R option to force it do so, like some other Linux programs have.

Now, there’s probably some bash scripting wizardry that could accomplish what I want in five lines or less, but I absolutely loathe bash scripting and so decided to write a simple program in C# to accomplish what I wanted instead.

What started out as a simple program has grown somewhat and should now be fairly robust and user-friendly. It sanity-checks all command line arguments, should catch all exceptions possible during normal operation and presents the user with a clear summary of what it has done/found when it finishes executing. The only thing it can’t handle at present is being run on special directories like /dev, /proc, or /sys.

Program Information

The program can operate in three different modes:

  1. Given a source directory, it generates checksums for all files within that directory tree and writes them to a file.
  2. Given both a source and a destination directory, it compares checksums for all files within the source tree with those in the destination tree. Optionally it will also write the checksums for all files within the source directory to a file.
  3. Given a destination directory, it checks the checksums for all files within that directory against a list provided from a file.

Depending on the mode, the program present the user with a summary of what it has found. This summary may include

  • the number of files and directories processed
  • Files that have changed between the source and destination folders
  • Files present in either the source or destination directory but absent in the other

When the program is set to output checksums to a file, I have ensured that the file created can be read by md5sum when that utility is used with the -c option.

I have attempted to add a degree of cross-platform compatibility with the inclusion of an option/fallback to use .Net’s built-in checksum generating functions instead of the Linux md5sum utility. However, I haven’t yet had the opportunity to thoroughly test its operation in Windows but from what I’ve seen so far, there are some problems.

The program also has the advantage of being able to run multiple checksum generating processes simultaneously, giving a potentially reduced execution time on multi-core computers. On a quad core Intel 2500K, the benefits of running multiple checksum generating processes simultaneously on a 32 GB folder containing nearly 53.000 files are illustrated by the following chart. Also of note is the relative performance of md5sum and .Net’s own checksum generation functions.

Chart to illustrate the advantage of running multiple simultaneous checksum generating processes

Usage Examples

Mode 1

mono /path/to/RecursiveChecksummer.exe -s /path/to/HomeDir -f /path/to/fileToWriteChecksumsOfHomeDirTo.txt -c 2

Mode 2

mono /path/to/RecursiveChecksummer.exe -s /path/to/HomeDir -d /path/to/backupOfHomeDir -n

Mode 3

mono /path/to/RecursiveChecksummer.exe -d /path/to/backupOfHome -f /path/to/fileContainingChecksumsOfHomeDir -c 4

Linux users who d not want to prefix every instance of the command with “mono” should read the “Registering .exe as non-native binaries” section of the mono documentation.

Download the Code

The code for the program is available on GitHub. You’ll need either Monodevelop or Visual C# Express to compile it and version 4.0 or higher of the Mono/.Net framework.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s