Wednesday, November 03, 2010

bash script to verify ZIP or RAR archives

Bash script that will check all ZIP files in the current directory and all sub-directories. Valid files are left alone. Files with problems (encrypted, password protected, CRC errors, not a ZIP file) get renamed with various extensions.

#!/bin/bash

find . -type f -name '*.zip' -print0 | while read -d $'\0' file
do
    # skip loop iteration if file no longer exists
    if [[ ! -f "$file" ]] ; then continue; fi
    #echo "$file"
    
    unzip -t -qq "$file"
    RETVAL=$?

    # return values (error codes) are generally unique to each archive tool
    case $RETVAL in
        # 0=success, archive file is okay
        0) 
            echo "OK: $file"
            ;;
        
        # codes that indicate a broken archive
        3) 
            echo "BROKEN($RETVAL): $file"
            mv "$file" "$file.broken"
            ;;
            
        # codes that indicate file is not a supported archive
        9) 
            echo "WRONGFORMAT($RETVAL): $file"
            mv "$file" "$file.wrongformat"
            ;;
            
        # user pressed ctrl-C or break or killed the process    
        80) 
            echo "USER ABORT: $file"
            ;;
            
        # codes that indicate an encrypted archive
        81) 
            echo "ENCRYPTED($RETVAL): $file"
            mv "$file" "$file.encrypted"
            ;;
            
        # codes that indicate a password-protected archive
        82) 
            echo "PASSWORD($RETVAL): $file"
            mv "$file" "$file.password"
            ;;
            
        # other error codes
        *) 
            echo "ERROR($RETVAL): $file"
            ;;
    esac
done

Same script, but changed to handle RAR files. The error codes are defined as an enum in "errhnd.hpp". An approximate list is:

SUCCESS=0, WARNING=1, FATAL_ERROR=2, CRC_ERROR=3, LOCK_ERROR=4, WRITE_ERROR=5, OPEN_ERROR=6, USER_ERROR=7, MEMORY_ERROR=8, CREATE_ERROR=9, NO_FILES_ERROR=10, USER_BREAK=255

Code 3 is commonly returned if files within the RAR archive have errors. Code 10 may indicate that the RAR archive is not actually a RAR file (it might be a misnamed ZIP or 7Z archive).

#!/bin/bash

find . -type f -name '*.rar' -print0 | while read -d $'\0' file
do
    # skip loop iteration if file no longer exists
    if [[ ! -f "$file" ]] ; then continue; fi
    #echo "$file"
    
    unrar t -idq "$file"
    RETVAL=$?
    
    case $RETVAL in
        # 0=success, archive file is okay
        0) 
            echo "OK: $file"
            ;;
        
        # codes that indicate a broken archive
        3) 
            echo "BROKEN($RETVAL): $file"
            mv "$file" "$file.broken"
            ;;
            
        # probably indicates wrong format (ZIP or 7Z file as RAR)
        10) 
            echo "WRONGFORMAT($RETVAL): $file"
            mv "$file" "$file.wrongformat"
            ;;
            
        # user pressed ctrl-C or break or killed the process    
        255) 
            echo "USER ABORT: $file"
            ;;
            
        # other errors
        *) 
            echo "ERROR($RETVAL): $file"
            ;;
    esac
done

Both scripts work fine under Cygwin, but in order to get "unrar" for Cygwin you will have to download the UnRar portable source from RARLabs.

As a final note, here the script to undo the name change, for cases where the script doesn't work as expected. It searches for any files ending in '.broken' and strips that portion of the file name back off. This can be used for other mass-renaming activities with minor edits. It may be useful to change sed to use ':' as the delimiter instead of '/' (i.e. sed 's:.broken$::g') to make things easier to write.

#!/bin/bash

find . -type f -name '*.broken' -print0 | while read -d $'\0' file
do
    # skip loop if file no longer exists
    if [[ ! -f "$file" ]] ; then continue; fi
    #echo "$file"
    
    NEWF=$(echo "$file" | sed 's/.broken$//g')
    mv "$file" "$NEWF"
done

4 comments:

Unknown said...

Hello, great script, but I tried your script with (7z t -r -psecret "$file") but after showing result for first archive it finds is stops and won't test the remaining archives, please help me.

Thomas said...

I would suggest not using the -r flag. From the 7z man page "Recurse subdirectories (CAUTION: this flag does not do what you think, avoid using it)".

Other then that you should uncomment the "echo" line and comment out the "7z" line and see whether it finds all of the files in the directory. Then uncomment the "7z" line to see if it tests properly.

Unknown said...

Thank you for your help. I was using command "read -p "blabla"" and this was breaking the loop. here is the final script that works:
#!/bin/bash
error=0
find . -type f -regex ".*\(zip\|7z\)$" -print0 | { while read -d $'\0' file
do
7z t -ppassword "$file" >> result.txt
RETVAL=$?
case $RETVAL in
0)
echo "OK: $file"
;;
2)
echo "error code ($RETVAL): $file"
error=1
;;
esac
done
if [ $error = "1" ]; then
echo "Error, see result.txt"
else
echo "all archives are ok"
fi }

Unknown said...

here is windows script of someone also needs it:
@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
echo checking *.zip *.7z archives in current folder and all subfolders...
SET /A COUNT=0
SET error=0
for /r %%i in (*.zip *.7z) do (
SET /A COUNT+=1
"C:\Program Files\7-Zip\7z.exe" t "%%i" * -r -ppassword > "result !COUNT!.txt"
if ERRORLEVEL 1 call :fail
)
goto lol
:fail
SET error=1
start "" notepad.exe "result %COUNT%.txt"
:lol
if %error% == 0 (
echo All archives are good. > good.txt
start "" notepad.exe good.txt )
timeout /t 1
del "result *.txt"
del good.txt