Catching up on PLOA I noticed a posting from Greg Black bemoaning the lack of ZFS in Linux so I thought I should make a couple of quick points in response to it.
- The CDDL/GPL thing is just down to the fact that their requirements are incompatible (Sun based the CDDL the MPL), so you can’t mix that code. Just have to live with that.
- A major issue with ZFS is that there is ongoing patent litigation in the US between Sun and NetApp over it – it’ll be interesting to see what Oracle do when they finally take over Sun (assuming Sun doesn’t expire before the EU regulators comes to a decision on the takeover)
- ZFS-FUSE isn’t dead! Whilst Ricardo has stopped work another group has taken up the challenge and there is a new home page for it – http://rudd-o.com/new-projects/zfs – complete with Git repository (no more Mercurial, huzzah!).
- The ZFS-FUSE mailing list is active too, if you want to learn more.
After a reboot today whilst installing KDE 4.3.1 I noticed the following messages in my kernel (2.6.31-rc8) logs (courtesy of the KDE file watcher that was following /var/log/kern.log):
Sep 6 13:53:10 quad kernel: [ 142.842723] EXT4-fs error (device dm-7): ext4_mb_generate_buddy: EXT4-fs: group 287: 5812 blocks in bitmap, 5418 in gd
Sep 6 13:53:11 quad kernel: [ 143.452041] JBD: Spotted dirty metadata buffer (dev = dm-7, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Sep 6 13:53:11 quad kernel: [ 143.486915] JBD: Spotted dirty metadata buffer (dev = dm-7, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Sep 6 13:53:11 quad kernel: [ 143.486942] JBD: Spotted dirty metadata buffer (dev = dm-7, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
That didn’t look too good, so I immediately did a “git pull” and happily found 2.6.31-rc9 was out so built that and then did a dual backup, rsync’ing to my local ZFS-FUSE drive (which takes snapshots so I can go backwards in time) and also an rsnapshot to a USB external disk. Then with trepidation I rebooted and found myself looking at an fsck error on /home due to shared blocks between an image and part of my local clone of Linus’s kernel git tree (d’oh!). Whilst the fsck got the filesystem mountable again it did result in not being able to view the kernel git tree due to missing files so I decided it was far safer to just revert to my latest backup, which worked like a charm (phew!).
Moral of the story – keeping backups is good – keeping lots of backups is even better, especially when running with release candidate kernels! 😉
Sam Varghese interviewed me about filesystems as part of an article about the rise of ext4, you can find his article “Enter ext4, the filesystem of the future” at ITwire.
A week ago I had a hairy crash when stopping ZFS/FUSE on my box, which I mentioned on the ZFS/FUSE mailing list. I upgraded from 184.108.40.206 to 2.6.26-rc7 and in the process blew away the kernel build tree for the 220.127.116.11 kernel to recover the disk space. Shortly after that I received a query off-list from Miklos Szeredi, the FUSE maintainer, asking if I could supply him a disassembly of the offending function from the kernel build – which was now consigned to the bit bucket. 🙁
Fortunately I’ve been regularly rsync’ing various important parts of my computer onto ZFS/FUSE partitions and snapshoting them with timestamps so I (theoretically) was only a few commands away from getting to the defunct kernel tree once more. The problem is that unfortunately you can’t look at a ZFS/FUSE snapshot directly at the moment, it’s one of the parts that is still to be gotten working under Linux.
Luckily there is a trick to be able to get access, which is is to create a clone of the snapshot. The ZFS Administration Guide describes a clone thus:
A clone is a writable volume or file system whose initial contents are the same as the dataset from which it was created. As with snapshots, creating a clone is nearly instantaneous, and initially consumes no additional disk space. In addition, you can snapshot a clone.
The magic command to do this was just:
zfs clone ZFS/home@20080606-2201 ZFS/temp
and suddenly I had /srv/ZFS/temp, a fully working version of this machines /home directory as it was around 10pm on the 6th June and in it was the kernel tree.
After the previous benchmark of btrfs I thought it’d be interesting to revisit ZFS using FUSE under Linux, so after updating to the current tip (02d648b1676c) in the Mercurial trunk I created a 30GB LVM volume for testing and gave it a go. Now you can’t compare it to previous results as this is completely different hardware, but the numbers look quite respectable in comparison to the in-kernel file systems tested yesterday.
In July I was commissioned to write an article for LinuxWorld called “Emerging Linux Filesystems” which they published in early September in three parts. Part of the deal was that there was a 90 day exclusivity period for them before I could republish it elsewhere, which has now lapsed.
So you can now read the article in its original (single page) form complete with inline images and graphs and covering Ext4, NILFS, btrfs, Reiser4, ChunkFS and ZFS under both FUSE on Linux and OpenSolaris. Enjoy!
My thanks to Don Marti of LinuxWorld for commissioning (and paying for) the article and to Dragan Dimitrovici of Xenon Systems for the loan of the test system!
Recently LinuxWorld commissioned me to write an article on Emerging Linux Filesystems (the formatting is a bit different from the original I sent, but the slideshow of graphs now works) and have kindly given me permission to present a talk based on my work at the October Linux Users of Victoria (LUV) meeting.
So if you can make it you can hear about my experiences with ChunkFS, btrfs, NILFS, ext4, Reiser4 and ZFS/FUSE, as well as with ZFS under OpenSolaris (in this case Nexenta).
I’d also like to thank Dragan at Xenon Systems for the loan of a shiny, Linux friendly, test system!
An email interview with myself following on from my ZFS/FUSE blogs has formed part of an article about ZFS/FUSE on LinuxWorld, which has also been noted on LWN.
I was wondering whether FUSE was being a bottleneck in my various ZFS-FUSE tests or whether the performance issues at present are just that ZFS is very young code on Linux and that the fact that Riccardo hasn’t yet started on optimisation.
Here’s a quick update on my previous results for striping and RAID-Z when testing ZFS on an old system with multiple SCSI drives (see the previous post for details of the system config).