MD5 checksums Help with protecting from data degradation or curruption

Space monkey 12

Estimable
Sep 25, 2015
20
0
4,560
Hi everyone

If I use a program to create md5 checksums for my master copy's of my files before I back them up and then scan the backups of my files periodically with the same md5 checksums

1) this ensures that the copy's are identical ?

2) would me scanning the backups md5 compared to the originals md5 tell me if a file had become damaged or corrupted

Thanks
 
Solution
1. Yes, it ensures they are identical.
2. Yes and no. This would tell you if the file changed. So long as you never change the files yourself, then it would be a decent indicator. The slightest change can completely change the checksum, but might not necessarily be enough to visually corrupt the video. However, if the file does not get corrupted or changed in any other way at all, then the MD5 checksum will never change. So long as that checksum comes back the same, you know the file is exactly as you left it.

Well, technically, it is theoretically possible for the file to change in such a way that the MD5 checksum doesn't change just because the MD5 checksum is shorter than the file is, but that's practically impossible unless done...

Spectre694

Estimable
Feb 28, 2014
167
0
4,710
It won't protect the data from corruption but it will allow you to tell if it has become corrupted. However MD5 is considered to be broken at this point. ( A way was found to generate collisions) probably fine for home files but it is highly recommended to use the SHA (SHA-256 in particular not -0 or -1) functions instead.

1) yes every file should have only one hash if anything changes in the files then the hash will be different (two identical files = samne hash)

2)yes, if they are exactly the same as the original then they give the exact same hash as the original
 

blazorthon

Distinguished
Sep 24, 2010
761
0
18,960
1. Yes, it ensures they are identical.
2. Yes and no. This would tell you if the file changed. So long as you never change the files yourself, then it would be a decent indicator. The slightest change can completely change the checksum, but might not necessarily be enough to visually corrupt the video. However, if the file does not get corrupted or changed in any other way at all, then the MD5 checksum will never change. So long as that checksum comes back the same, you know the file is exactly as you left it.

Well, technically, it is theoretically possible for the file to change in such a way that the MD5 checksum doesn't change just because the MD5 checksum is shorter than the file is, but that's practically impossible unless done intentionally. This is what Spectre694 means by MD5 being "broken", it is possible for someone to intentionally alter a file in such a way that it will retain the proper MD5 checksum. If you're worried about data corruption and not about hackers or some friend who thinks he/she is funny, then it's not a problem.
 
Solution

Space monkey 12

Estimable
Sep 25, 2015
20
0
4,560
Ok no these files would be archived but accessible online I will never change a single pixel in any of these pics or videos.

So it is an ok way to check for corruption and then restore from a backup that has identical checksums if you find a bad checksum.

The program I planned on using is MD5sums I think it has a option for MD5 or SHA

so I should use SHA not MD5
 

blazorthon

Distinguished
Sep 24, 2010
761
0
18,960
There are multiple editions of SHA. The first SHA has also been broken. SHA-2 or SHA-3 would be better if you want to avoid that issue.

If you're not changing them at all, then yes, this is an effective method of testing for data corruption and you could restore from a backup like you said.
 

blazorthon

Distinguished
Sep 24, 2010
761
0
18,960

Space monkey 12

Estimable
Sep 25, 2015
20
0
4,560
Thanks I will look at them ones now. I used the one I mentioned as we were talking about it after a scan a few files report errors and they are,

Thumbs.db files checksums do not match (should I worry about this and any idea what it's for I gather thumbnail database ?

Also there are a couple of file does not exist but looking on my nas it does exist I gather this was a drop in connection or something ?

And I will probably pick a setting in the middle for speed and security as some files are slightly large like 2gb videos 2mb picture thousands of them
 

blazorthon

Distinguished
Sep 24, 2010
761
0
18,960
I don't know what thumbs.db is relative to your pictures, but if that's just a thumbnail cache, then that's a file that is supposed to change. Maybe your computer is saving thumbnails of new pictures in that file as you put them on the computer, so it changes because of that.

I have not tried hashing a large file and I don't know how long that will take. I could try running it on one of my Windows ISO image files to try it.

EDIT I just tried it on a 220MB installer and it took about 20 seconds with SHA-512, so I guess a 2GB video will take around two minutes on my computer. For reference I did it on a somewhat old laptop with an A10-4600 and a cheap OCZ ARC 100 SSD.
 

Space monkey 12

Estimable
Sep 25, 2015
20
0
4,560
Thank you both for the help yes I'm pretty sure the thumbs.db is a image cache it's not needed to view or restore the pictures anyways

With the program I mentioned to create and check the MDA5 checksums

it took windows only 30mins to scan my thousands of photos 12GB TOTAL

To check them on the nas took over 3 hours :(

Everything was fine except the thumbs.db files that are not important

I will try one of the ones you linked next

Thanks again