|
Replies:
24
-
Last Post:
Oct 14, 2009 3:50 PM
by: sofaman
|
|
|
Posts:
5
From:
Registered:
3/31/09
|
|
|
|
ZFS and deduplication?
Posted:
Mar 31, 2009 2:32 AM
To: Communities » zfs » discuss
Cc: Communities » zfs » code
|
|
As a Netapp user i have grown very fond of the deduplication feature we have on our filers, and some time ago i heard that dedup was being investigated on zfs aswell.
Do you guys know how far this has come?
I really like to build a low cost backup appliance using zfs and dedpu.. :)
|
|
|
Posts:
124
From:
Registered:
3/9/05
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Mar 31, 2009 2:47 AM
in response to: joachims
|
|
Yes -- dedup is my (and Bill's) current project. Prototyped in December. Integration this summer. I'll blog all the details when we integrate, but it's what you'd expect of ZFS dedup -- synchronous, no limits, etc.
Jeff
On Tue, Mar 31, 2009 at 02:32:11AM -0700, Joachim Sandvik wrote: > As a Netapp user i have grown very fond of the deduplication feature we have on our filers, and some time ago i heard that dedup was being investigated on zfs aswell. > > Do you guys know how far this has come? > > I really like to build a low cost backup appliance using zfs and dedpu.. :) > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-code _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
426
From:
GB
Registered:
3/21/06
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Mar 31, 2009 7:23 AM
in response to: bonwick
To: Communities » zfs » code
|
|
Jeff That's great news. Thanks for sharing that. I'm really looking forward to reading about the details. Best Regards Nigel Smith
|
|
|
|
Posts:
55
From:
Registered:
11/29/05
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Apr 4, 2009 3:19 PM
in response to: bonwick
To: Communities » zfs » code
|
|
> Yes -- dedup is my (and Bill's) current project. > Prototyped in December. > ntegration this summer. I'll blog all the details > when we integrate, > but it's what you'd expect of ZFS dedup -- > synchronous, no limits, etc.
I am not sure about the "what you'd expect" part. Previous discussions here showed interest for a synchronous version, but also for asynchronous versions (one that works in the background, or one you can run occasionally like a scrub, there are plenty of possibilities).
I am not complaining though, the synchronous version is an interesting one, and it should be easier to add the other versions afterwards.
|
|
|
|
Posts:
416
From:
Registered:
4/27/05
|
|
|
|
|
Posts:
27
From:
Registered:
5/3/06
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Apr 19, 2009 1:15 AM
in response to: bonwick
To: Communities » zfs » code
|
|
Awesome news, Jeff. I know you said you'd write about it later, but I want to pose these questions now for several reasons: - I'm excited and eager and can't wait :-) - There may be things we could do now to prepare existing data and pools for easier dedup later - There may be useful hints in here for documentation, test cases, further RFEs, etc.
So, in no particular order: - will it use only the existing checksums, or an additional compare or method? - will it depend on using a particular (eg stronger) checksum? would it help to switch now to that checksum method so blocks written in the meantime are "ready"? (I'm already concerned about the fletcher2 implementation thread and will likely switch anyway) - will it dedup across the entire pool, or only within a dataset? - will it be enable/disable per dataset? (space vs speed) - will it interact with copies=>1? especially where dup blocks exist between datasets that differ in copies= settings? I hope I'd get new ditto blocks for the highest copies= referrer, but then what about when that dataset is destroyed and there are more copies than needed? - will it interact with compression (i.e, does it dedup source blocks or on-disk blocks)? If I write the same files to datasets with differing compression settings, how many copies do I store? - will it detect only whole blocks with the same alignment, or is there something I can do to improve detection of smaller duplicate blocks and split them? - will there be a way for me to examine files for the "dup nature" (I'm thinking of something like seeking for holes) at the app level, to use the information the fs has already discovered? - will it depend on bp-rewrite at all? (for delivery; I presume bp-rewrite will be needed to dedup existing blocks, but is there an implementation dependency that entangles these two somehow, such that we need to wait for both?) - will zfs send be able to avoid sending multiple copies of dup data?
|
|
|
|
Mitchell Erblich
erblichs@earthlink.net
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Apr 19, 2009 2:54 AM
in response to: uep
|
|
Group,
Within this long list and am sure an incomplete list...
Let me add some thoughts for the future by a non ZFS developer..
The immediate items below are just for thought.
Ok,, I assume it at the block level, but easily could be wrong.. Aren't ZFS file block when modified: read, copied, then inode updated to point to new block.
IMO, based on #3, I don't think you need to support block splitting and if you did, what would prevent heavily modified files/objects degrading to the smallest block supported?
Can't a app find holes now?
My (immediate) short list is:
1) How are you going to support backward compatibility to remove existing dups? Where dups are located locally and/or network wide.
2) Other than additional code space and code complication, what level of performance degradation, due to what must be some hash lookup, etc added into the code fastpath?
3) With the storage capacities/density rapidly rising and the ability to mirror data for disaster recovery, load balance, allow single digit/lan ms network access time vs wan access times, then how does a single administrator within the LAN determine the level of support/tradeoffs of this new feature within a global co..
4) What disk/file objects are considered inappropriate for dedup..?
5) How will you support Direct I/O or will you support wrt Direct I/ O?
Mitchell Erblich ------------------------------
On Apr 19, 2009, at 1:15 AM, Daniel Carosone wrote:
> Awesome news, Jeff. I know you said you'd write about it later, but > I want to pose these questions now for several reasons: > - I'm excited and eager and can't wait :-) > - There may be things we could do now to prepare existing data and > pools for easier dedup later > - There may be useful hints in here for documentation, test cases, > further RFEs, etc. > > So, in no particular order: > - will it use only the existing checksums, or an additional compare > or method? > - will it depend on using a particular (eg stronger) checksum? would > it help to switch now to that checksum method so blocks written in > the meantime are "ready"? (I'm already concerned about the > fletcher2 implementation thread and will likely switch anyway) > - will it dedup across the entire pool, or only within a dataset? > - will it be enable/disable per dataset? (space vs speed) > - will it interact with copies=>1? especially where dup blocks exist > between datasets that differ in copies= settings? I hope I'd get > new ditto blocks for the highest copies= referrer, but then what > about when that dataset is destroyed and there are more copies than > needed? > - will it interact with compression (i.e, does it dedup source > blocks or on-disk blocks)? If I write the same files to datasets > with differing compression settings, how many copies do I store? > - will it detect only whole blocks with the same alignment, or is > there something I can do to improve detection of smaller duplicate > blocks and split them? > - will there be a way for me to examine files for the "dup > nature" (I'm thinking of something like seeking for holes) at the > app level, to use the information the fs has already discovered? > - will it depend on bp-rewrite at all? (for delivery; I presume bp- > rewrite will be needed to dedup existing blocks, but is there an > implementation dependency that entangles these two somehow, such > that we need to wait for both?) > - will zfs send be able to avoid sending multiple copies of dup data? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
_______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: ZFS and deduplication?
Posted:
May 16, 2009 11:43 AM
in response to: joachims
To: Communities » zfs » code
|
|
I've been trying to keep up with the latest news on ZFS and deduplication, and unfortunately there's not much news out there and this thread happens to contain the most recent information on the subject. That being said, I found an article that mentioned the following (http://www.technologyandbusiness.com.au/server-hardware-software/News/Sun-shines-more-light-on-Open-Source-Kernels.aspx) -
"Sun Microsystems will hold its first-ever Australian Kernel conference in July this year that will examine any open source kernels and technologies within those kernels. ... Sun Microsystems’ Sun Fellow and Vice President, Jeff Bonwick and Distinguished Engineer Bill Moore will present the opening keynote for the conference, titled 'Deduplication in ZFS,' which will likely include a live demonstration."
According to http://au.sun.com/sunnews/events/2009/kernel/, the above mentioned conference will be held in Brisbane, Australia, from July 15th to 17th, 2009.
Hopefully deduplication will come to SXCE shortly thereafter, and maybe we'll see it in a second OpenSolaris release this year after 2009.06 is released.
|
|
|
|
Posts:
937
From:
AU
Registered:
3/9/05
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
May 16, 2009 3:39 PM
in response to: bjquinn
|
|
On Sat, 16 May 2009 11:43:30 -0700 (PDT) BJ Quinn <bjquinn at seidal dot com> wrote:
> I've been trying to keep up with the latest news on ZFS and deduplication, and unfortunately there's not much news out there and this thread happens to contain the most recent information on the subject. That being said, I found an article that mentioned the following (http://www.technologyandbusiness.com.au/server-hardware-software/News/Sun-shines-more-light-on-Open-Source-Kernels.aspx) - > > "Sun Microsystems will hold its first-ever Australian Kernel conference in July this year that will examine any open source kernels and technologies within those kernels. ... Sun Microsystems’ Sun Fellow and Vice President, Jeff Bonwick and Distinguished Engineer Bill Moore will present the opening keynote for the conference, titled 'Deduplication in ZFS,' which will likely include a live demonstration." > > According to http://au.sun.com/sunnews/events/2009/kernel/, the above mentioned conference will be held in Brisbane, Australia, from July 15th to 17th, 2009. > > Hopefully deduplication will come to SXCE shortly thereafter, and maybe we'll see it in a second OpenSolaris release this year after 2009.06 is released.
Actually, I'm hoping that Jeff and Bill will have integrated ZFS Deduplication into NV before they get on the plane to come over here for KCA. We'll just have to wait and see :-)
Either way, whenever they integrate it into NV, it'll be in the next build of SXCE and should show up in OpenSolaris' dev repo shortly afterwards. There won't be another OpenSolaris _release_ this year after 2009.06.
Cheers, James C. McPherson (chief instigator/agitator for Kernel Conference Australia) -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
439
From:
PL
Registered:
11/17/06
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
May 16, 2009 10:58 PM
in response to: jmcp
|
|
On Sun, May 17, 2009 at 12:39 AM, James C. McPherson <James dot McPherson at sun dot com> wrote: > There won't be another OpenSolaris > _release_ this year after 2009.06. What do you mean by that?
-- Piotr Jasiukajtis | estibi | SCA OS0072 http://estseg.blogspot.com _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
937
From:
AU
Registered:
3/9/05
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
May 17, 2009 3:24 AM
in response to: estibi
|
|
On Sun, 17 May 2009 07:58:49 +0200 Piotr Jasiukajtis <estseg at gmail dot com> wrote:
> On Sun, May 17, 2009 at 12:39 AM, James C. McPherson > <James dot McPherson at sun dot com> wrote: > > There won't be another OpenSolaris > > _release_ this year after 2009.06. > What do you mean by that?
That we've had 2 OpenSolaris binary distro releases so far - 2008.05 and 2008.11. As has been mentioned before, there's going to be 2009.06, and the next one is planned to be next year some time.
Builds of ON are separate to Releases.
Yay for taxonomies, and hair-splitting.
James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: ZFS and deduplication?
Posted:
May 19, 2009 2:02 PM
in response to: joachims
To: Communities » zfs » code
|
|
> Either way, whenever they integrate it into NV, it'll be in > the next build of SXCE and should show up in OpenSolaris' > dev repo shortly afterwards.
Wait, does this mean that you could run the most recent version of OpenSolaris at the time (i.e. 2009.06) and just update to the newest version of ZFS from the dev repository?
You'll have to excuse my Linux background, but filesystems typically aren't the kind of thing you can update without updating the kernel itself. Never thought of just updating the fs independent of the kernel... or have I misunderstood something altogether?
|
|
|
|
Darren J Moffat
darrenm@opensolaris....
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
May 20, 2009 1:09 AM
in response to: bjquinn
|
|
BJ Quinn wrote: >> Either way, whenever they integrate it into NV, it'll be in >> the next build of SXCE and should show up in OpenSolaris' >> dev repo shortly afterwards. > > Wait, does this mean that you could run the most recent version of OpenSolaris at the time (i.e. 2009.06) and just update to the newest version of ZFS from the dev repository? > > You'll have to excuse my Linux background, but filesystems typically aren't the kind of thing you can update without updating the kernel itself. Never thought of just updating the fs independent of the kernel... or have I misunderstood something altogether?
When you 'pkg image-update' it will update everything consistently so the kernel is updated as well as all the matching commands and libraries.
Unlike Linux the OpenSolaris kernel isn't separate but is part of the same source base as libc and many of the core commands. The builds in the /dev repository carefully lined up collection of all the source bases (consolidations) to form a build.
There is on OpenSolaris equivalent to grabbing just the kernel source for Linux.
-- Darren J Moffat _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: ZFS and deduplication?
Posted:
May 22, 2009 9:16 AM
in response to: joachims
To: Communities » zfs » code
|
|
> When you 'pkg image-update' it will update everything consistently so > the kernel is updated as well as all the matching commands and libraries.
> Unlike Linux the OpenSolaris kernel isn't separate but is part of the > same source base as libc and many of the core commands. The builds in > the /dev repository carefully lined up collection of all the source > bases (consolidations) to form a build.
Got it. So this means that if I wait for deduplication to show up in the OS dev repository, then I can install 2009.06 and the run pkg image-update. This would update me to build 125 (or whatever) with deduplication, although I'd have to wait for 2010.02 (or whatever) to have that functionality in a "non-dev" version.
> There is on OpenSolaris equivalent to grabbing just the kernel source > for Linux.
I assume you mean "no" OpenSolaris equivalent.
Thanks for the explanation!
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: ZFS and deduplication?
Posted:
May 26, 2009 11:26 AM
in response to: joachims
To: Communities » zfs » code
|
|
Update for anyone who's keeping tabs on this - according to http://au.sun.com/sunnews/events/2009/kernel/speakers.jsp the demonstration will be by both Jeff Bonwick and Bill Moore at 9:15am local time on 7/15.
That's 6:15pm CST / 7:15pm EST on the 14th here in in America.
|
|
|
|
Posts:
937
From:
AU
Registered:
3/9/05
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
May 26, 2009 3:06 PM
in response to: bjquinn
|
|
On Tue, 26 May 2009 11:26:49 -0700 (PDT) BJ Quinn <bjquinn at seidal dot com> wrote:
> Update for anyone who's keeping tabs on this - according to http://au.sun.com/sunnews/events/2009/kernel/speakers.jsp the demonstration will be by both Jeff Bonwick and Bill Moore at 9:15am local time on 7/15. > > That's 6:15pm CST / 7:15pm EST on the 14th here in in America.
Hi BJ, As the organiser of KCA, I'd just to clarify that while I'm hoping Jeff and Bill demonstrate ZFS deduplication in their keynote, I'm not guaranteeing that they will :-)
Registrations for KCA are now open, btw.
For the full agenda with abstracts please visit http://wikis.sun.com/display/KCA2009/KCA2009+Conference+Agenda
For the conference website please visit http://au.sun.com/sunnews/events/2009/kernel,
and for registrations, please go without delay to https://www.conveneit.com/secure/sun/kernel_jul_09.
The pricing is very reasonable:
students AUD95 earlybird AUD195 (ends 31st May 2009) regular AUD300
cheers, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Jun 24, 2009 12:32 PM
in response to: jmcp
To: Communities » zfs » code
|
|
Will the slides by the speakers be available for download? Maybe with a holdback of a couple of months?
|
|
|
|
Posts:
937
From:
AU
Registered:
3/9/05
|
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Jul 19, 2009 4:34 PM
in response to: jmcp
To: Communities » zfs » code
|
|
Any slides, transcript, blog posts, or any other information somewhere on the deduplication keynote?
|
|
|
|
Posts:
937
From:
AU
Registered:
3/9/05
|
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Aug 14, 2009 8:01 AM
in response to: jmcp
To: Communities » zfs » code
|
|
Do we know when dedupe will show up in SXCE and whether it will make it in 2010.02 or not?
Also, I've noticed that the dedupe presentations still haven't been posted yet. Is there anywhere else I can go to find out some more details about the dedupe implementation? A google search turns up surprisingly little (nothing, actually), with basically the same search results from before KCA.
Not trying to nag, just dying to play around with this feature!
|
|
|
|
Posts:
166
From:
DE
Registered:
6/15/05
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Aug 14, 2009 9:41 AM
in response to: bjquinn
|
|
> Do we know when dedupe will show up in SXCE and whether it will make it in 2010.02 or not?
Well, the first part of the question is actually easy to answer since we know that SXCE will be phased out ~ October. So the answer is "no".
> Also, I've noticed that the dedupe presentations still haven't been posted > yet. Is there anywhere else I can go to find out some more details about > the dedupe implementation? A google search turns up surprisingly little > (nothing, actually), with basically the same search results from before > KCA.
Nothing more fun than a good conspiracy theory. :-)
Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Sun Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim Email: vab@bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgröße: 45 Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt _______________________________________________ zfs-code mailing list zfs-code at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-code
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Aug 14, 2009 1:16 PM
in response to: vab
To: Communities » zfs » code
|
|
Just wanting some reading on dedupe, whether it's the from the KCA presentations or otherwise. I was "virtually" present at the KCA, but the sound was so bad during the keynote that I couldn't really understand what was being said.
I don't have a conspiracy theory - there just isn't any good info on dedupe that I can find. I'd love to learn more about it or try it out. I've read the other conspiracy theory posts. That's not how I think. :)
I understand that SXCE will be phased out, but I thought possibly an early version of dedupe would be available before October. Alternatively, I guess I could ask if anyone knows when it will be available to play with in whatever we call the "beta/testing/development/non-release" version of OpenSolaris.
Then again, conspicuously, the audio was bad ONLY for the keynote, ALL the presentations have been released EXCEPT for the keynote, Oracle wants to kill OpenSolaris, and GreenBytes has a secret deal with Sun to destroy dedupe, ZFS and open source in general, and to just overall destroy the free world and democracy as we know it. Just kidding. :)
|
|
|
|
Posts:
64
From:
Registered:
9/8/08
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Sep 3, 2009 11:29 AM
in response to: bjquinn
To: Communities » zfs » code
|
|
Hey James, if the delay in publishing the information on the keynote is a lack of time on your part, if you'd send me the raw video/slides/etc., I'd be more than willing to put together something nice that you could present on the KCA website.
|
|
|
|
Posts:
1
From:
US
Registered:
10/14/09
|
|
|
|
Re: [zfs-code] ZFS and deduplication?
Posted:
Oct 14, 2009 3:50 PM
in response to: bjquinn
To: Communities » zfs » code
|
|
I don't suspect a conspiracy theory, I suspect some legal issues of which I've heard rumblings, but is there any update on this? I'm very anxious to see deduplication implemented in ZFS!
|
|
|
|
|