OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zfs » code

Thread: fletcher2/4 implementations fundamentally flawed

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 16 - Last Post: Apr 14, 2009 2:27 PM by: schlie Threads: [ Previous | Next ]
schlie

Posts: 15
From:

Registered: 8/11/08
fletcher2/4 implementations fundamentally flawed
Posted: Aug 17, 2008 8:45 AM
To: Communities » zfs » code
  Click to reply to this thread Reply

As a heads up, Fletcher class of checksum algorithms are reasonably
strong because they are specified to perform 1's complement sums
(i.e. end around carries [(a + b) mod (2^n -1)]) such that sums
don't overflow into oblivion and thereby loose critical information.

Unsigned additions in "C" are not 1's complement additions.

As presently implemented (and arguably not even Fletcher checksums),
they are incredibly weak, and not even capable of generating unique
checksums across large blocks of zero valued data for example,
containing single bit errors which are arguably not uncommon.

If one reviews carefully correct Fletcher checksum implementations,
it will become apparent that either adds with carry are utilized if
implemented in assembly language, or larger accumulators are used
to compute the sum (such that upper carries are not lost), and then
the upper portion of the accumulator are iteratively then added to
the lower portion of the sum until no further carries are generated,
and thereby effectively performing a 1's complement modular addition.

Please consider fixing the implementation, as it's presently deceptively
weak, although perceived to be otherwise riding on the coattails of the
relative strength of Fletcher checksums, which the implementation is not.

[As an aside, as I believe that most otherwise undetected checksum miss
matches are most likely caused by the introduction of single bit errors
during the transport of data to or from storage or otherwise, having a
stronger checksum implementation having a hamming distance of at least
2 although more ideally 3, provides the opportunity to potentially enable
the correction of single bit errors with reasonable probability if all else
fails in lieu of relying on the data being hopefully archived somewhere
else or resulting in catastrophic otherwise.]

schlie

Posts: 15
From:

Registered: 8/11/08
Re: fletcher2/4 implementations fundamentally flawed (correction)
Posted: Aug 18, 2008 4:41 PM   in response to: schlie
To: Communities » zfs » code
  Click to reply to this thread Reply

CORRECTION: As fletcher4 actually sums 32bit (not 64bit) data using 64bit accumulators,
the first two checksum terms (a and b) will not overflow for data block sizes as large as
128KB, and therefore should be considered at least as strong as that expected of a 32/64bit
Fletcher checksum (although the remaining terms (c and d) may overflow); and thereby have
a worst case hamming distance of at least 3 for zfs's maximum 128KB block size; which is a
good thing. (sorry for my earlier miss-diagnosis)

fletcher2 (zfs's default) however remains arguably flawed, however may be easily strengthened
by correspondingly being restricted to summing 32bit data using 64bit accumulators, or possibly
alternatively summing 64bit data using 128bit accumulators if it's viewed reasonably supportable
by target architectures/compilers likely to host zfs.

Jonathan Adams
jonathan.adams@sun.com
Re: fletcher2/4 implementations fundamentally flawed (correction)
Posted: Aug 22, 2008 2:24 PM   in response to: schlie

  Click to reply to this thread Reply

On Mon, Aug 18, 2008 at 04:41:35PM -0700, paul wrote:
> CORRECTION: As fletcher4 actually sums 32bit (not 64bit) data using 64bit accumulators,
> the first two checksum terms (a and b) will not overflow for data block sizes as large as
> 128KB, and therefore should be considered at least as strong as that expected of a 32/64bit
> Fletcher checksum (although the remaining terms (c and d) may overflow); and thereby have
> a worst case hamming distance of at least 3 for zfs's maximum 128KB block size; which is a
> good thing. (sorry for my earlier miss-diagnosis)
>
> fletcher2 (zfs's default) however remains arguably flawed, however may be easily strengthened
> by correspondingly being restricted to summing 32bit data using 64bit accumulators, or possibly
> alternatively summing 64bit data using 128bit accumulators if it's viewed reasonably supportable
> by target architectures/compilers likely to host zfs.

Thanks for the report. I've filed:

6740597 zfs fletcher-2 is losing its carries

to track this.

Cheers,
- jonathan

> --
> This messages posted from opensolaris.org
> _______________________________________________
> zfs-code mailing list
> zfs-code at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-code
_______________________________________________
zfs-code mailing list
zfs-code at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-code


butlerm

Posts: 7
From:

Registered: 3/27/09
Re: fletcher2/4 implementations fundamentally flawed
Posted: Mar 27, 2009 8:42 PM   in response to: schlie
To: Communities » zfs » code
  Click to reply to this thread Reply

As I understand it, there is no way to fix a problem with the algorithm of a defined checksum without invalidating existing zfs filesystems. Any fix to to the fletcher2 will have to be given a new name.

Given how incredibly weak the current fletcher2 is, perhaps the first thing that should be done is to change the default to fletcher4. The flawed fletcher2 appears to be 32768 times weaker than the 16 bit TCP checksum algorithm, i.e. it appears to only have a 50% chance of catching a single bit error or any series of single bit errors in the most significant bit of any 64 bit word in a disk block. For those bits, it is equivalent to a *1 bit* checksum.

relling

Posts: 2,083
From: US

Registered: 6/17/05
Re: [zfs-code] fletcher2/4 implementations fundamentally flawed
Posted: Mar 27, 2009 9:02 PM   in response to: butlerm

  Click to reply to this thread

Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2010, Oracle Corporation and/or its affiliates

Oracle