ZFS on Linux Works! (Update 3)

Here’s my quick experience trying out the ZFS alpha release with write support. First I built and installed ZFS and then ran the run.sh script to run the FUSE process that the kernel will use to provide the ZFS file system. Then the fun really begins.

Test platform: Ubuntu Edgy Eft (6.10) with a pair of Seagate Barracuda 7200.9 300GB drives running through a Silicon Image, Inc. SiI 3112 SATA controller in a software RAID-1 mirror (MD driver) and LVM2 on top of that to provide logical volumes for file systems.

First we need a logical volume to play with, I use LVM over a software RAID-1 mirror (using the MD driver) so it’s pretty easy. I’ll set it to be a 10GB partition so we’ve got space to play with:

root@inside:~# lvcreate -L 10G -n ZFS /dev/sata

Now we’ve got some raw storage to play with we need to create a ZFS pool which will also be a top level directory (which I didn’t realise initially). So we’ll create a pool called “test” and that will also create a /test directory.

root@inside:~# zpool create test /dev/sata/ZFS

OK – so what does zpool say about its status ?

root@inside:~# zpool status

  pool: test
 state: ONLINE
 scrub: none requested
config:

        NAME             STATE     READ WRITE CKSUM
        test             ONLINE       0     0     0
          /dev/sata/ZFS  ONLINE       0     0     0

errors: No known data errors

Well that’s good, it’s told us it’s not spotted any errors yet. ๐Ÿ™‚

So we’ve got a pool, now we need to allocate some of that pool to a file system. To make it easy we won’t specify a limit now as (I believe) one can be allocated later. We’ll call this volume1 and it’ll appear as /test/volume1.

root@inside:~# zfs create test/volume1

That’s create the area, made the file system and mounted it for us. Not bad, eh ? Here’s the proof, the next command I typed was:

root@inside:~# zfs list

NAME           USED  AVAIL  REFER  MOUNTPOINT
test           114K  9.78G  25.5K  /test
test/volume1  24.5K  9.78G  24.5K  /test/volume1

Now we’ll give it some real work to do, we’ll use Russell Cokers excellent bonnie++ disk benchmarking tool which will test a heap of I/O characteristics (not all of which we’ll do here because it’ll take too long).

First of all we’ll go into the ZFS file system we just created.

root@inside:~# cd /test/volume1/

Now we’ll run bonnie++ and tell it to only run in “fast” mode, which will skip the per-character I/O tests (life’s too short). I also need to tell it to really run as root, but only because I was too lazy to change the directory owner to my real user. Ahem. ๐Ÿ™‚

root@inside:/test/volume1# time bonnie++ -f -u root

This is the result!

Using uid:0, gid:0.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
inside           2G           13455   1  6626   1           24296   1  58.7   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1832   4  5713   4  1394   2  1955   4  8804   6  1709   3
inside,2G,,,13455,1,6626,1,,,24296,1,58.7,0,16,1832,4,5713,4,1394,2,1955,4,8804,6,1709,3

real    12m27.073s
user    0m1.236s
sys     0m9.405s

Not too bad for an alpha release of a file system, it ran to completion with no errors or crashes!

Now we need an idea of how a comparable file system performs on the same hardware so as a comparison I ran bonnie++ on an XFS partition which is also on an LVM logical volume. This is how it performed (( The original version of this test, with Beagle still running, took 8m 22s )):

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
inside           2G           42738  11 20034   5           42242   5 261.6   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1614   4 +++++ +++  1550   3  1236   3 +++++ +++   207   0
inside,2G,,,42738,11,20034,5,,,42242,5,261.6,1,16,1614,4,+++++,+++,1550,3,1236,3,+++++,+++,207,0

real    5m53.601s
user    0m0.292s
sys     0m16.473s

So significantly faster for most operations, though interestingly ZFS was quicker for all create & deletes except the sequential delete.

Now given a previous comment on the ZFS blog about the impact of compression on performance I thought it would be interesting to try it out for myself. First you turn it on with:

root@inside:/test/volume1# zfs set compression=on test

(how easy was that ?) and re-ran bonnie++. What I got really surprised me!

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
inside           2G           13471   1 11813   2           72091   4  1169   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1707   4  4501   3  1520   3  1590   4 10065   6  1758   3
inside,2G,,,13471,1,11813,2,,,72091,4,1169.1,2,16,1707,4,4501,3,1520,3,1590,4,10065,6,1758,3

real    6m59.717s
user    0m1.200s
sys     0m8.813s

So this is significantly faster than the run without compression (( Originally, before I realised about Beagle, it looked faster than XFS )) . Now admittedly this is a synthetic test and I presume that Bonnie++ is writing files padded with zeros (( it is )) (or some other constant) rather than with random data, but I was still pretty amazed.

Copying a sample ISO image (Kubuntu 7.04 alpha) from /tmp was a little more realistic, with XFS taking about 33 seconds and ZFS with compression taking almost 2m 18s. Disabling compression decreased the time to around 1m 55s. Next up was another old favourite, untar’ing a bzip2’d Linux kernel image (in this case 2.6.19.1). This was done using time tar xjf /tmp/linux-2.6.19.1.tar.bz2

XFS took just under 1m 22s (( originally 2m 21s with Beagle running )) whilst ZFS without compression took 1m 30s and 1m 27s with compression. So a pretty even match there.

Removing the resulting kernel tree took just over 9s (( originally 24s with Beagle )) on XFS, 14s on ZFS without compression and just under 19s with compression.

I have to say I’m very impressed with what Ricardo has managed to do with this and I really look forward to future releases that he says will improve performance! I’m also quite impressed with the management tools for ZFS!

Update 1

Tested again, this time LD_PRELOAD’ing it against Google’s tcmalloc library to see what happens then.

With compression it was almost a full minute quicker!

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
inside           2G           16500   1 13219   2           82316   5 918.1   2
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  2130   5  7636   5  1609   3  1994   3 13136   9  1821   4
inside,2G,,,16500,1,13219,2,,,82316,5,918.1,2,16,2130,5,7636,5,1609,3,1994,3,13136,9,1821,4

real    6m3.706s
user    0m1.108s
sys     0m8.677s

Now without compression it’s over 1 m 30s quicker too:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
inside           2G           15158   1  7698   1           30611   2  74.0   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1436   3  5925   4  1741   3  1214   2  7217   6  1761   3
inside,2G,,,15158,1,7698,1,,,30611,2,74.0,0,16,1436,3,5925,4,1741,3,1214,2,7217,6,1761,3

real    10m44.081s
user    0m1.072s
sys     0m8.645s

The kernel untar tests are now 1m 25s to with compression and 1m 12s with compression and the kernel tree remove is just under 15s with compression and just over 14s without.

Update 2

Mea culpa, I’d completely forgotten I had Beagle running and for the XFS tests (which I ran as myself in a sub-directory of my home directory) it was helpfully trying to index the 2GB test files that Bonnie was creating! This handicapped the original XFS tests and severely and disabling it sped the whole thing up by almost 1.5 minutes!

Now I’m going to try testing both XFS and again, this time with a version of Bonnie++ patched to use random data for its writes rather than just 0’s.

Update 3

Corrected the times for untar’ing and removing the kernel tree under XFS now Beagle isn’t running to confuse matters.