There is a many ways to checksum the file, and many available ready to use programs. Sometimes there is a need to integrate a checksum into your app. Most of the time we want to checksum the file to detect a corruption after the copy, move or download file.
Luckily python provides
hashlib module that implements many different secure hash and message digest algorithms. Among them we can find
MD5 algorithm implementation. It have been widely used in the software world as a way of checking if the given files are identical or if the transferred data was saved without corruption.
MD5 is one of the most common ways of checksum in the web world. Almost each web server will return or provide a way to fetch
MD5 checksum for given file that is allowed to download.
MD5 interface is really straight forward. Let's see how we can checksum a string first
>>> import hashlib >>> >>> string = "Hello world with hashlib.MD5\n" >>> md5_check = hashlib.md5() >>> md5_check.update(string) >>> >>> md5_check.digest() '6\xb7\xf7\xe7\x82\x98\x94\x88O\x1d\x9ak\x19\xb8\xbb\x8c' >>> md5_check.hexdigest() '36b7f7e7829894884f1d9a6b19b8bb8c'
To check if the string was encoded properly we can create a file with the same content and run
$ echo "Hello world with hashlib.MD5" > test.txt $ md5sum test.txt 36b7f7e7829894884f1d9a6b19b8bb8c test.txt
The output is the same as
hexdigets(hexadecimal representation) from python code.
There is also one shortcut function implemented in
md5 module directly.
>>> import md5 >>> >>> md5.new("Hello world with hashlib.MD5\n").hexdigest() 36b7f7e7829894884f1d9a6b19b8bb8c
Checksum large files
Sometimes the file for checksum can be larger than the available RAM memory. If this happens, the file can not be checksummed whole at once. Luckily it's easy to load file in chunks and combine them into one final checksum.
>>> import hashlib >>> >>> md5_check = hashlib.md5() >>> with open('test.txt', "rb") as f: >>> for chunk in iter(lambda: f.read(5), b""): >>> md5_check.update(chunk) >>> return md5_check.hexdigest()
Hope it helps.