I was wondering whether FUSE was being a bottleneck in my various ZFS-FUSE tests or whether the performance issues at present are just that ZFS is very young code on Linux and that the fact that Riccardo hasn’t yet started on optimisation.
As a quick reminder, here’s what JFS can do on a software RAID 1 array on this desktop:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP inside 2G 39388 11 24979 5 53968 6 255.4 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 9032 24 +++++ +++ 8642 33 2621 18 +++++ +++ 993 5 inside,2G,,,39388,11,24979,5,,,53968,6,255.4,1,16,9032,24,+++++,+++,8642,33,2621,18,+++++,+++,993,5 real 4m0.982s user 0m0.292s sys 0m17.201s
…and here is how ZFS-FUSE compares…
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP inside 2G 18148 4 9957 3 28767 3 164.4 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 3093 6 6201 8 3090 4 2906 5 8592 9 4165 6 inside,2G,,,18148,4,9957,3,,,28767,3,164.4,0,16,3093,6,6201,8,3090,4,2906,5,8592,9,4165,6 real 7m59.385s user 0m1.140s sys 0m16.201s
That’s of the order of half the speed. So, how does NTFS-3G compare ? Here are the results:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP inside 2G 31222 6 14076 4 30118 2 137.5 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 1780 4 15379 10 4521 8 3276 7 16683 14 4429 5 inside,2G,,,31222,6,14076,4,,,30118,2,137.5,0,16,1780,4,15379,10,4521,8,3276,7,16683,14,4429,5 real 6m14.292s user 0m1.032s sys 0m14.173s
So at first blush it looks somewhere between the two with a 6+ minute run time, but the disk write & re-write speeds are substantially better than ZFS.
But where it gets really interesting is comparing NTFS-3G with my XFS results, below:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP inside 2G 31444 9 15949 4 30409 4 261.3 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 3291 21 +++++ +++ 2720 14 3286 22 +++++ +++ 874 7 inside,2G,,,31444,9,15949,4,,,30409,4,261.3,1,16,3291,21,+++++,+++,2720,14,3286,22,+++++,+++,874,7 real 5m38.876s user 0m0.380s sys 0m19.645s
Here’s a table of results comparing the write, rewrite and read speeds of each.
Write | Rewrite | Read | |
---|---|---|---|
JFS (kernel) | 39 | 25 | 54 |
XFS (kernel) | 31 | 16 | 30 |
NTFS-3G (FUSE) | 31 | 14 | 30 |
ZFS (FUSE) | 18 | 10 | 28 |
So that’s pretty conclusive, we have a FUSE filesystem (which also claims to not be optimised) which can pretty much match an in-kernel filesystem.
Maybe XFS would look better on a drive array more powerful than a RAID-1?
The people who use XFS for serious things seem to generally have arrays of 5 or more disks, and having many dozens of disks isn’t uncommon.
With a RAID-1 every synchronous write will block all IO while on a RAID-5 with a suitably advanced controller (IE not Linux software RAID-5) or RAID-10 will allow multiple writes to occur at the same time.
Of course most of the Bonnie tests won’t show this unless you use the -p/-y options.
One thing that would be really interesting to see would be test results from the same filesystem in both FUSE and kernel versions. If someone made Ext3 run via FUSE it would be very useful to measure the FUSE overhead.
Well, first of all it’s all I’ve got to play with and I need the resilience. All the filesystems in question are handicapped in the same way. 🙂
One thing I’ve found recently is that with 4 P and 8 P arrays you can avoid the read-before-write RAID5 problem, so it can make sense to do RAID-0 striping across multiples of those rather than have a single large RAID5 array that could end up having to read some drives to work out the new parity before writing the stripe.
Very interesting. This is one of the first useful benchmarks on ZFS I’ve seen although its the FUSE port. Given the slight difference of NTFS-G3 and XFS while the difference between JFS is far higher I’d love to see this all compared to Ext3 on your hardware. The reason is that I’m used to Ext3 performance. Does that make sense to you? 🙂
Btw, the NTFS-G3 website does have a benchmark on NTFS-G3 and there it was already proven it was basically… very fast for a userspace FS.
The benchmark posted on Feb 21, 2007 on the ntfs-3g website uses version 1.0 of the driver; as can be seen, versions 1.3xx and more recent implement a new algorithm for cluster location. Said algorithm has the advantage os fragmenting big files much less, of copying those same files much faster and of reducing CPU and memory use.
My personal tests seem to indicate that ntfs-3g has a bigger CPU use than the Microsoft ntfs driver, and is also more demanding than, say, ext3. However, it is much better at not fragmenting files than the Microsoft native file system (compared with latest Vista version).
Areas of optimization could be found in better scaling (right now, it works very well on 40 GB partitions, but when I copy stuff on a 120 Gb partition, it really drags on my CPU) and more automated file management (automatic detection of FS code page). Further optimization could be found in, say, automatic block allocation of frequently modified files at top of drive, and dynamic priority changes depending on system load and FIFO fill rate.
While it is compatible with Windows logical RAID partitions and is able to mount them under Linux, it has seen no optimization that I know of in this area; the block allocator could be (if it isn’t already) threaded so as to accelerate reads and writes on logical RAID setups.
Don’t get me wrong, I think the latest version (1.516 at the time of writing) is fantastic – but when the authors mention that the driver could be optimized, please consider that a file system is almost like a cat; there are multiple ways to skim one.
Just for the record, what’s the time for extracting a recent kernel tarball? Filesystems do kinda funny things when it comes to these …
Hi Jan, I’d played with doing that when I first started playing with ZFS but hadn’t revisited it since, plus I was extracting from a compressed archive which added some extra overhead.
So here are some new numbers for 2.6.19.1 for ZFS (with and without compression), XFS and JFS.
Extracting with
cat /tmp/linux-2.6.19.1.tar | time tar xf -
XFS: 40s
JFS: 27s
ZFS without compression: 86s
ZFS with compression: 77s
Removing the resulting source code tree with
time rm -rf linux-2.6.19.1
XFS: 19s
JFS: 25s
ZFS without compression: 20s
ZFS with compression: 15s
There you go!
I’d guess JFS does not have barrier support (hence appearing faster). You could compare mount -t xfs -o nobarrier. How fusezfs does (or not) barriers, no idea 🙂
Hmm, all my XFS file systems report things like:
Filesystem "dm-9": Disabling barriers, not supported by the underlying device
Would using the
nobarrier
mount option make a difference in that case ?http://oss.sgi.com/projects/xfs/faq.html#wcache
barrier makes sure metadata arrives on disk first IIRC, hence it makes operations slower (but safe). Also see http://lkml.org/lkml/2006/5/22/278
Hi Jan,
I guess what I’m asking is that would there be any performance difference between me explicitly disabling write barriers at mount time and XFS disabling write barriers itself at mount time because the underlying devices don’t support them ?
If you can prove that explicitly disabling write barriers gives different performance to having write barriers disabled because the device doesn’t support them then that would be a bug. It seems quite unlikely that there would be such a bug, but if you find such a bug then please report it.
Apparently one of the worst things that can happen is when a device claims to support write barriers but doesn’t actually do so. Then the filesystem driver jumps through hoops to manage write barriers (with some performance cost) but the reliability is not provided.
I’m just saying, if you run a benchmark, either make sure that every fs got write barriers on (and working), or off, to get comparable results.
Just done some testing and can confirm there is no difference between with and without “nobarrier” on my hardware (which doesn’t support them).
Pingback: hotsolaris
Nice to see these benchmarks! 🙂
NTFS-3G is indeed completely unoptimized. I shortly mentioned the current major
bottlenecks on http://lwn.net/Articles/238812/ but of course there are many more.
The similarities between the performance of XFS and NTFS-3G could be that I use
XFS and currently that’s the “baseline” performance during NTFS-3G development.
Anything much worse than that is considered to be a usability bug, not a performance
problem, so it gets more attention.
The comparison is valid when I/O is the real bottleneck, not the CPU. If the
processor is too slow compared to the disk (e.g. embedded devices or high
I/O throughput servers) then user space file systems will suffer a lot
since the performance support infrastucture in the kernel and FUSE isn’t
developed, optimized yet.
The new, unfinished, unoptimized ntfs-3g block allocator since version 1.328
helps if the volume is fragmented. I also noticed that Microsoft’s NTFS block
allocator is fairly inefficient.
About the “scalability” (40 -> 120 GB disk experience). I think the reason for the
high CPU usage is what I mentioned above: the bigger the disk the faster, so the
CPU can be used more which results higher CPU usage. Thanks Mitch74 for the
ideas.
Automatic detection of FS code page: highly OS and environment specific. If the
distribution, OS vendor sets it up properly before mounting an NTFS volume then
the driver will work fine without the usage of ‘locale=’ workaround mount option.
As for zfs-fuse, since data should stay in the kernel hence some of zfs code
should also go there (e.g. end-to-end checksumming).
Hi Szabolcs, thanks for that information!
Unfortunately because of ZFS being under CDDL it’s unlikely that any of its code will make it into the kernel for the moment (unless it’s derived from the GPL licensed bits in GRUB).
Hi Chris,
I don’t think the amount of code which needs to be rewritten under a different licence for optimization to be included in the kernel would be significant compared to a full rewrite which is practically impossible and would take about “forever”. Let’s say full rewrite vs optimization: 100,000 vs 1000 lines of code. In fact, some of the code which is needed to be in the kernel are already under GPL2 (e.g. checksum, compression).
That’s very interesting. Sounds like we need to see about properly optimising the FUSE port then! 🙂
Of course, having a proper in-kernel implementation would still be the best solution as then you’d be able to boot of of ZFS and therefore use it exclusively. Still, it’d be nice to be able to store at least my data files on a ZFS partition.
you can boot from an NTFS partition using ntfs-3g, because the kernel can read NTFS by default (thus load its image, then mount the boot partition read only, then load ntfs-3g, then remount it as read/write). If there is a read-only implementation of ZFS in-kernel, then it can already boot.
No need for in-kernel NTFS support to boot Linux from NTFS. Grub supports this natively (ZFS too) and NTFS-3G implemented bmap which is needed by LILO. Several distributions use NTFS-3G for root file system, for example WUBI (Windows Ubuntu Installer). When the kernel booted with an initrd or ramfs then it can mount a file system and pivotroot to it to be the root file system.
There are some minor issues e.g. during shutdown in which order the subsystems and processes are terminated but they are solvable and being worked on.
Er, booting from ZFS/FUSE under Linux already works folks!
Riccardo commented on that feat saying:
🙂
Pingback: Between the Lines mobile edition
Pingback: How to implement ZFS on FUSE | techinterplay
Pingback: Tried ZFS on Linux? – LINUX For You Magazine
Pingback: Super fast Ext4 filesystem | c1p1