Thursday, September 20, 2012

GetLMD, a File Hashing Utility

Want an easy way to find duplicate files, or check for errors? Try GetLMD. It's a command line utility that I wrote long ago, and only just got around to publishing. It was intended to demonstrate the LMD hash family, so the code is written in an obtuse manner which forgoes some obvious optimizations in exchange for clarity. Nevertheless, it's very fast. The output looks like this, for example:

GetLMD build 47
Computes the LMD/LMD2/LMD3 of a file or directory.
Copyright (c) 2012 Russell Leidich, all rights reserved.

D2E2C9132E17DF1A: getlmd/build.h
BE37DA2850485A67: getlmd/constant.h
94EED84F26B6CC2C: getlmd/COPYING
844D8481E6860212: getlmd/file_sys.h
5056B4444D7F617C: getlmd/getlmd.c
6E6A243555254A52: getlmd/getlmd.exe
9A74D885D069BF92: getlmd/getlmd32
F9EA958FDAC7F040: getlmd/getlmd64
373446334609A081: getlmd/
93E60B9867442314: getlmd/
253A2F6F4ED2A8BC: getlmd/readme.txt
59997B11FC9D6990: getlmd/win32_build.bat

476542E8D2313940 (Sum of file-size-hashed (LMD2)s modulo 2^64.)

The source code is included under the licensing terms explained in COPYING. It does, however, seem to work in a manner consistent with the hash definitions provided in this blog, and has been tested against the LMD2 and LMD3 reference code. See readme.txt for details. Really, this was all just intended as a demo, so the code could be much more elegant and performant. Nevertheless, it's quite useful.

For example, you can capture the hashes to a file, like this (in 64-bit Linux):

./getlmd64 your_folder >your_output_file.txt

Or detect duplicate files by looking for the same LMD twice in a row:

./getlmd64 your_folder |sort

I don't have time to do further development, but I'll try to help if you find any bugs., without the fourth letter.

Enjoy, and by all means feel free to build something better with it!