|
Replies:
16
-
Last Post:
Apr 14, 2009 2:27 PM
by: schlie
|
Threads:
[
Previous
|
Next
]
|
|
Posts:
15
From:
Registered:
8/11/08
|
|
|
|
fletcher2/4 implementations fundamentally flawed
Posted:
Aug 17, 2008 8:45 AM
To: Communities » zfs » code
|
|
As a heads up, Fletcher class of checksum algorithms are reasonably strong because they are specified to perform 1's complement sums (i.e. end around carries [(a + b) mod (2^n -1)]) such that sums don't overflow into oblivion and thereby loose critical information.
Unsigned additions in "C" are not 1's complement additions.
As presently implemented (and arguably not even Fletcher checksums), they are incredibly weak, and not even capable of generating unique checksums across large blocks of zero valued data for example, containing single bit errors which are arguably not uncommon.
If one reviews carefully correct Fletcher checksum implementations, it will become apparent that either adds with carry are utilized if implemented in assembly language, or larger accumulators are used to compute the sum (such that upper carries are not lost), and then the upper portion of the accumulator are iteratively then added to the lower portion of the sum until no further carries are generated, and thereby effectively performing a 1's complement modular addition.
Please consider fixing the implementation, as it's presently deceptively weak, although perceived to be otherwise riding on the coattails of the relative strength of Fletcher checksums, which the implementation is not.
[As an aside, as I believe that most otherwise undetected checksum miss matches are most likely caused by the introduction of single bit errors during the transport of data to or from storage or otherwise, having a stronger checksum implementation having a hamming distance of at least 2 although more ideally 3, provides the opportunity to potentially enable the correction of single bit errors with reasonable probability if all else fails in lieu of relying on the data being hopefully archived somewhere else or resulting in catastrophic otherwise.]
|
|
|
Posts:
15
From:
Registered:
8/11/08
|
|
|
|
Re: fletcher2/4 implementations fundamentally flawed (correction)
Posted:
Aug 18, 2008 4:41 PM
in response to: schlie
To: Communities » zfs » code
|
|
CORRECTION: As fletcher4 actually sums 32bit (not 64bit) data using 64bit accumulators, the first two checksum terms (a and b) will not overflow for data block sizes as large as 128KB, and therefore should be considered at least as strong as that expected of a 32/64bit Fletcher checksum (although the remaining terms (c and d) may overflow); and thereby have a worst case hamming distance of at least 3 for zfs's maximum 128KB block size; which is a good thing. (sorry for my earlier miss-diagnosis)
fletcher2 (zfs's default) however remains arguably flawed, however may be easily strengthened by correspondingly being restricted to summing 32bit data using 64bit accumulators, or possibly alternatively summing 64bit data using 128bit accumulators if it's viewed reasonably supportable by target architectures/compilers likely to host zfs.
|
|
|
|
Jonathan Adams
jonathan.adams@sun.com
|
|
|
|
Re: fletcher2/4 implementations fundamentally
flawed (correction)
Posted:
Aug 22, 2008 2:24 PM
in response to: schlie
|
|
On Mon, Aug 18, 2008 at 04:41:35PM -0700, paul wrote: > CORRECTION: As fletcher4 actually sums 32bit (not 64bit) data using 64bit accumulators, > the first two checksum terms (a and b) will not overflow for data block sizes as large as > 128KB, and therefore should be considered at least as strong as that expected of a 32/64bit > Fletcher checksum (although the remaining terms (c and d) may overflow); and thereby have > a worst case hamming distance of at least 3 for zfs's maximum 128KB block size; which is a > good thing. (sorry for my earlier miss-diagnosis) > > fletcher2 (zfs's default) however remains arguably flawed, however may be easily strengthened > by correspondingly being restricted to summing 32bit data using 64bit accumulators, or possibly > alternatively summing 64bit data using 128bit accumulators if it's viewed reasonably supportable > by target architectures/compilers likely to host zfs.
Thanks for the report. I've filed:
6740597 zfs fletcher-2 is losing its carries
to track this.
Cheers, - jonathan
> -- > This messages posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-code _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
7
From:
Registered:
3/27/09
|
|
|
|
Re: fletcher2/4 implementations fundamentally flawed
Posted:
Mar 27, 2009 8:42 PM
in response to: schlie
To: Communities » zfs » code
|
|
As I understand it, there is no way to fix a problem with the algorithm of a defined checksum without invalidating existing zfs filesystems. Any fix to to the fletcher2 will have to be given a new name.
Given how incredibly weak the current fletcher2 is, perhaps the first thing that should be done is to change the default to fletcher4. The flawed fletcher2 appears to be 32768 times weaker than the 16 bit TCP checksum algorithm, i.e. it appears to only have a 50% chance of catching a single bit error or any series of single bit errors in the most significant bit of any 64 bit word in a disk block. For those bits, it is equivalent to a *1 bit* checksum.
|
|
|
|
Posts:
2,083
From:
US
Registered:
6/17/05
|
|
|
|
Re: [zfs-code] fletcher2/4 implementations fundamentally flawed
Posted:
Mar 27, 2009 9:02 PM
in response to: butlerm
|
| | | |