Recently, I had reason to doubt the integrity of the weekly backup that I make of my home directory. To check the integrity of the backup I thought it would be a simple matter of running
md5sum * > homeChecksums.md5
on my home directory, followed by
md5sum -c homeChecksums.md5
on the backup directory, but no such luck. md5sum doesn’t automatically operate recursively on a directory and nor is there a -R option to force it do so, like some other Linux programs have.
Now, there’s probably some bash scripting wizardry that could accomplish what I want in five lines or less, but I absolutely loathe bash scripting and so decided to write a simple program in C# to accomplish what I wanted instead.
What started out as a simple program has grown somewhat and should now be fairly robust and user-friendly. It sanity-checks all command line arguments, should catch all exceptions possible during normal operation and presents the user with a clear summary of what it has done/found when it finishes executing. The only thing it can’t handle at present is being run on special directories like /dev, /proc, or /sys.
The program can operate in three different modes:
- Given a source directory, it generates checksums for all files within that directory tree and writes them to a file.
- Given both a source and a destination directory, it compares checksums for all files within the source tree with those in the destination tree. Optionally it will also write the checksums for all files within the source directory to a file.
- Given a destination directory, it checks the checksums for all files within that directory against a list provided from a file.
Depending on the mode, the program present the user with a summary of what it has found. This summary may include
- the number of files and directories processed
- Files that have changed between the source and destination folders
- Files present in either the source or destination directory but absent in the other
When the program is set to output checksums to a file, I have ensured that the file created can be read by md5sum when that utility is used with the -c option.
I have attempted to add a degree of cross-platform compatibility with the inclusion of an option/fallback to use .Net’s built-in checksum generating functions instead of the Linux md5sum utility. However, I haven’t yet had the opportunity to thoroughly test its operation in Windows but from what I’ve seen so far, there are some problems.
The program also has the advantage of being able to run multiple checksum generating processes simultaneously, giving a potentially reduced execution time on multi-core computers. On a quad core Intel 2500K, the benefits of running multiple checksum generating processes simultaneously on a 32 GB folder containing nearly 53.000 files are illustrated by the following chart. Also of note is the relative performance of md5sum and .Net’s own checksum generation functions.
mono /path/to/RecursiveChecksummer.exe -s /path/to/HomeDir -f /path/to/fileToWriteChecksumsOfHomeDirTo.txt -c 2
mono /path/to/RecursiveChecksummer.exe -s /path/to/HomeDir -d /path/to/backupOfHomeDir -n
mono /path/to/RecursiveChecksummer.exe -d /path/to/backupOfHome -f /path/to/fileContainingChecksumsOfHomeDir -c 4
Linux users who d not want to prefix every instance of the command with “mono” should read the “Registering .exe as non-native binaries” section of the mono documentation.