Sunday, May 17, 2015

md5sum bash script to create check file for a directory tree

Just a quick script that will run through the current directory and all descendant directories, creating a single "verify-tree.md5" file (using md5sum). If the check file already exists, then the existing one gets moved out of the way to "verify-tree.yyyymmdd-hhmmss.md5" for safekeeping.

Check files are useful for whenever you have a set of files that will not (or should not) change over time.  Such as files written off to an archive tape / disk / optical media / flash drive.  While the media and file system might also do some checking, it's good to have a second layer that is under your control and which can be queried.

(If you want file recovery features, you should look into MultiPar or par2j.)

The script's output looks like:


The file (verify-tree.md5) that is created is a standard md5sum file and can be read by just about any compatible software.  Some software cannot handle sub-folders, however, so you may have to use the md5sum program to do the verification.

----------------------------------------------------------------
#!/bin/bash

# stop script on errors
set -e

PROG=md5sum
FILENAME=verify-tree

if [[ -e "${FILENAME}.md5" ]]; then
    mv "${FILENAME}.md5" \
    "${FILENAME}.$(date --reference=${FILENAME}.md5 '+%Y%m%d-%H%M%S').md5"
fi

echo ""
echo "Output Filename: ${FILENAME}.md5"
echo "Files Found: $(find . -type f -not -name "${FILENAME}*.md5" | wc -l)"
echo "Size: $(du -chs . | grep 'total')"

time find . -type f -not -name "${FILENAME}*.md5" \
    -exec ${PROG} "{}" \; >> "${FILENAME}.md5"

echo ""
echo "Files Processed: $(wc -l ${FILENAME}.md5)"

echo ""
echo "Checking..."
time ${PROG} -c --quiet "${FILENAME}.md5"
echo ""
echo "All files verified."
----------------------------------------------------------------

Note: There are multiple ways to write the md5sum line.  It's probably better to use the xargs method, but I have not tested it out.

find . -type f -not -name "${FILENAME}*.md5" -exec ${PROG} "{}" \; >> "${FILENAME}.md5

find . -type f -not -name "${FILENAME}*.md5" -print0 | xargs -0 ${PROG} >> "${FILENAME}.md5"

find . -type f -not -name "${FILENAME}*.md5" | xargs ${PROG} >> "${FILENAME}.md5"

It's also easy to adapt this script to work with sha256sum or sha1sum.


The verification script is a trimmed down version of the original script:

----------------------------------------------------------------
#!/bin/bash

# stop script on errors
set -e

PROG=md5sum
FILENAME=verify-tree

echo ""
echo "Checking: ${FILENAME}.md5"
echo "File Hashes: $(wc -l ${FILENAME}.md5)"
time ${PROG} -c --quiet "${FILENAME}.md5"
echo ""
echo "All files verified."
----------------------------------------------------------------

I have tested this in CygWin 64-bit on Windows 7 64-bit Professional, but it should also work fine on Linux/Unix/OSX systems as long as the md5sum command is available.  The script is conservative in design with very few "tricks" so it should be portable.

No comments: