Using Integrit to verify a file system

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I'm using the Integrit file comparison system (creates an MD5 checksum
database of all the files specified in a configuration file) to confirm
system loads of a linux distribution I create and distribute to a large
group of users (this is an embedded application running on a commercial
device). What I've got is a simple method where the user can load the
new software (wipes the drive, creates new partitions and filesystem)
and once the system has been loaded they can push a button on a GUI
that runs Integrit (using a database I create prior to distributing the
load) and compares the installed system to the system I shipped.

The problem I'm experiencing is that we're seeing many failures
involving a dozen or fewer files. We load the system automatically from
a CD. The system will sometimes work fine and the file system will
verify as identical to what I created. Many times though, several files
(between 5 and 12) will show checksum differences. Reloading doesn't
help but it's always different files that have the checksum errors.
We've seen this with two and three new hard drives in a row. I can't
believe we've got that much of a hardware problem and it makes me think
I'm not fully understanding what Integrit is doing.

Could someone with some experience with Integrit give me an idea of
what might be going on here? I'm only doing a checksum on the files
(Integrit can do a lot more). Is it possible that fragmentation could
be causing the checksums to be different without there really being a
problem? These files are often man pages and lib files associated with
things that are not important to the operation of our system. They do
seem to indicate an underlying problem with the hardware though. To
ignore it would be to abandon our system of verification.

Any help or insight would be greatly appreciated.

Re: Using Integrit to verify a file system

On 16 Mar 2006 06:41:37 -0800, James Kimble
Quoted text here. Click to load it
If the checksums are different, the content of the files must be
different.  Have you tried to find the differences?

A memorandum is written not to inform the reader, but to protect the writer.
        -- Dean Acheson

Re: Using Integrit to verify a file system

I haven't looked at the files (most are binary) but that's a good idea.
As I've been studying the problem I'm getting more convinced that this
is a drive (IDE) configuration issue. These are older motherboards
(PIII) and the problem seems most common using a drive we recently
switched to from an older drive. The older drive doesn't seem to have
this problem. Some of the hardware is recycled so the potential for
hardware issues is fairly large. I'm getting more convinced that
Integrit is just showing us problems we always had but weren't aware
of. I'm not positive of that yet but the evidence is building.

Thanks for your comments though. All thoughts welcome.

Re: Using Integrit to verify a file system

I have never implemented the integrit system. However I have utilized
various other host and/or network integrity systems. What you are
describing seems to be a data problem and not physical in nature.
Meaning regardless of what physical medium that contains a specific
object(file) the same file should cryptographically verify as such --
if it hasn't been altered. Or ......?

Most filesystems utilize a block structure to read/write data to a
block device such as a disk. By default the blocks are 4k-16k in size.
If a file is only 3k in logical size, then that leaves approx. 1k slack
space. This slack space should not affect the integrity of the data
object. UNLESS your filesystem is corrupt and the End Of File
pointer(which is different for each filesystem) is out of place. Thus
allowing previously written data -- if not securely wiped clean --- to
be concatenated to your original object. However, this is very
unlikely. Some other problem may be altering the data structure maybe
upon reboot.

i.e. Journaling or cache read and write during data sync

I would also state that you may want to change your hash algorithm. The
MD5 algorithm is commonly known to be broken and insecure. I would
suggest sha1 to sha256. Also, your hash checksum applicaiton may be
compromised and should be suspect. I would verify the integrity of your
applicaiton prior to utilization.

Btw, are you utilizing different processors or architecture for your
systems. I.e. K7, P3 or powerpc? A compilation on different processors
may include cpu specific flags to
optimize the application for it's environment.

This is an interesting dilemma could you cc: me on your findings and/or

Site Timeline