Geek Software of the Week: PhotoRec
Recover your photos!
PhotoRec – Photo Recovery Software
How PhotoRec works
“FAT, NTFS, ext2/ext3/ext4 file systems store files in data blocks (also called clusters under Windows). The cluster or block size remains at a constant number of sectors after being initialized during the formatting of the file system. In general, most operating systems try to store the data in a contiguous way so as to minimize data fragmentation. The seek time of mechanical drives is significant for writing and reading data to/from a hard disk, so that’s why it’s important to keep the fragmentation to a minimum level.
When a file is deleted, the meta-information about this file (file name, date/time, size, location of the first data block/cluster, etc.) is lost; for example, in an ext3/ext4 file system, the names of deleted files are still present, but the location of the first data block is removed. This means the data is still present on the file system, but only until some or all of it is overwritten by new file data.
To recover these lost files, PhotoRec first tries to find the data block (or cluster) size. If the file system is not corrupted, this value can be read from the superblock (ext2/ext3/ext4) or volume boot record (FAT, NTFS). Otherwise, PhotoRec reads the media, sector by sector, searching for the first ten files, from which it calculates the block/cluster size from their locations. Once this block size is known, PhotoRec reads the media block by block (or cluster by cluster). Each block is checked against a signature database which comes with the program and has grown in the type of files it can recover ever since PhotoRec’s first version came out.
For example, PhotoRec identifies a JPEG file when a block begins with:
0xff, 0xd8, 0xff, 0xe0
0xff, 0xd8, 0xff, 0xe1
or 0xff, 0xd8, 0xff, 0xfe
If PhotoRec has already started to recover a file, it stops its recovery, checks the consistency of the file when possible and starts to save the new file (which it determined from the signature it found).
If the data is not fragmented, the recovered file should be either identical to or larger than the original file in size. In some cases, PhotoRec can learn the original file size from the file header, so the recovered file is truncated to the correct size. If, however, the recovered file ends up being smaller than its header specifies, it is discarded. Some files, such as *.MP3 types, are data streams. In this case, PhotoRec parses the recovered data, then stops the recovery when the stream ends.
When a file is recovered successfully, PhotoRec checks the previous data blocks to see if a file signature was found but the file wasn’t able to be successfully recovered (that is, the file was too small), and it tries again. This way, some fragmented files can be successfully recovered.”