Is my data checksummed?

Peter Grandi pg_ext3 at ext3.for.sabi.co.UK
Wed Feb 24 16:38:14 UTC 2010


>>> What checksumming is done for the actual data?  I know that
>>> storage devices often do their own checksumming too, but how
>>> can I be sure my data is integrity checked every time I read
>>> it?

These things ("storage devices often do their own checksumming"
and "my data is integrity checked") are rather unrelated.

Various parts of storage subsystems do things like checksumming
not to protect your data, but to detect potential faults. That
is mainly as a diagnostic not for integrity.

Part of the reason is that it is very difficult and needlessly
expensive to do comprehensivce integrity checking within the
storage subsystem, automagically.

>> If you use disks that support the Data Integrity Field (DIF)
>> extension, Linux will use it to provide end-to-end data
>> checksum support.  Otherwise, there are checksums on the disk
>> and between disk controller and the CPU, but those are
>> obviously not end-to-end checksums.

Yes. But I'll add that the only way to ensure that "data is
integrity checked" is to do it truly end-to-end, with data and
application specific checks. For example as a weak but useful
measure I 'zip' or gzip' (sometimes with zero compression if
already compressed) data that I want to be able to move around
across years and many storage devices.

Consider for example bugs in the IO subsystem itself, where the
wrong data ends up being written and checksummed, and gets
validated every time even if it is not the right data.

> Just to be clear, even with a storage path that supports
> DIF/DIX, we don't currently do anything for applications on
> top of file systems. The primary application to target storage
> path is covered mainly for raw devices.

Which makes it not that generally useful. In effect DIF is a hw
accelerator of a weak form of per-block checksumming. I think
that most current CPUs are fast enough to do it without it
beoing that noticeable.

>> Adding data-level checksums is not something that we are
>> planning on adding to the ext2/3/4 file systems.  BTRFS is
>> the only file system that has data-level checksums, but it's
>> not yet production ready.

But again that's not end-to-end. It is just as far as the
current storage system goes, and the biggest value, like for
ZFS, is to detect issues with the storage system itself (e.g.
bugs as well as hw issues).




More information about the Ext3-users mailing list