ZFS Bug From Solaris Found in Linux FUSE Version and Fixed

Those who know me in my day job know that I’m pretty good at breaking things, so I guess I shouldn’t be surprised I found a ZFS bug that was from the OpenSolaris code base and had been sitting there for about a year unnoticed. The ZFS on Linux developer has now fixed the bug and sent a patch back upstream, so hopefully there will be a fix in OpenSolaris because of work done on Linux!

The good thing is that because I found it on Linux running ZFS using FUSE the bug didn’t take my system down when the ZFS file system daemon died. 🙂 http://www.csamuel.org/2007/06/19/zfsfuse-makes-it-to-linuxworld-and-lwn/

Must Remember for Future ZFS on Linux Testing..

Linus added support for block device based filesystems into 2.6.20, so it’ll be interesting to see what (if any) effect on ZFS/FUSE it will have, especially given it’s named in the commit. 🙂

I never intended this, but people started using fuse to implement block device based “real” filesystems (ntfs-3g, zfs).

Looks like Ubuntu’s Feisty Fawn will ship with this as the 2.6.20 kernels in the development versions have the fuseblk filesystem showing up in /proc/filesystems once you’ve loaded the fuse module, and the fuse-utils package seems to support it too.

Update: Sadly it appears that this isn’t much use for ZFS. 🙁

Usenix Paper on Hard Disk Failures

Another gem from the Beowulf mailing list, this time courtesy of Justin Moore, who is the Google employee on the list I referred to in an earlier post.

This one is a paper published at the 5th USENIX Conference on File and Storage Technologies looking at failure rates in a disk population of 100,000 drives – a similar scale to the Google paper but this time spread over various disk technologies including SATA, Fibre Channel and SCSI.

Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?

They estimate, from the data sheets, that the nominal annual failure rate should be 0.88% but in reality found it to be in excess of 1% with 2-4% being common and ranging all the way up to 13%. They also see something different to the infant mortality that Mark Hahn alluded to when commenting on the Google paper:

We also find evidence, based on records of disk replacements in the field, that failure rate is not constant with age, and that, rather than a significant infant mortality effect, we see a significant early onset of wear-out degradation. That is, replacement rates in our data grew constantly with age, an effect often assumed not to set in until after a nominal lifetime of 5 years.

Their conclusions give numbers to this, saying:

For drives less than five years old, field replacement rates were larger than what the datasheet MTTF suggested by a factor of 2-10. For five to eight year old drives, field replacement rates were a factor of 30 higher than what the datasheet MTTF suggested.

For those interested in the perceived higher reliability of SCSI/FC drives over their SATA breathren the paper has this to say:

In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk-independent factors, such as operating conditions, usage and environmental factors, affect replacement rates more than component specific factors. However, the only evidence we have of a bad batch of disks was found in a collection of SATA disks experiencing high media error rates. We have too little data on bad batches to estimate the relative frequency of bad batches by type of disk, although there is plenty of anecdotal evidence that bad batches are not unique to SATA disks.

Usenix have published the full paper text online in either HTML form or as a PDF document, so take your pick and start reading!

Google Paper on Hard Disk Failures (Updated)

Eugen Leitl posted an interesting paper from Google to the Beowulf list, Failure Trends in a Large Disk Drive Population (PDF), where “large” is in excess of 100,000 drives. The paper abstract says:

Our analysis identifies several parameters from the drive’s self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.

Some of the Beowulfers have come up with constructive criticism of the paper, including interesting comment from rgb:

How did they look for predictive models on the SMART data? It sounds like they did a fairly linear data decomposition, looking for first order correlations. Did they try to e.g. build a neural network on it, or use fully multivariate methods (ordinary stats can handle it up to 5-10 variables).

and from Mark Hahn:

funny, when I saw figure5, I thought the temperature effect was pretty dramatic. in fact, all the metrics paint a pretty clear picture of infant mortality, then reasonably fit drives suriving their expected operational life (3 years). in senescence, all forms of stress correlate with increased failure. I have to believe that the 4/5th year decreases in AFR are either due to survival effects or sampling bias.

It will be interesting to see if they take notice of this open source peer review as there is at least one person from Google on the list.

Update: There is also a Usenix paper on hard disk failures that looks at different hard disc types.

More Scarey Australian Copyright Braindeadness..

The Association for Progressive Communications has a really interesting summary of the possible implications of new copyright legislation in Australia. They have a set of PDF’s there that give a “risk matrix” for teens, families, small businesses and industry.

If you’ve ever wondered how a bunch of kids singing in a restaurant can turn into a criminal offence under copyright law then this is for you (especially if you own an iPod). Read ’em and weep..

(Via)

BBC Asking Should New Service Be Microsoft Only ?

The BBC Trust is currently carrying out a consultation exercise into their new “On Demand” TV services over the Internet in which they ask “How important is it that the proposed seven-day catchup service be available to consumers who are not using Microsoft software ?” (see question 5).

The accompanying PDF says:

In respect of the seven-day catch-up over the internet service, the files would require DRM to ensure that they were appropriately restricted in terms of time and geographic consumption. The only system that currently provides this security is Windows Media 10 and above. Further, the only comprehensively deployed operating system that currently supports Windows Media Player 10 and above is the Windows XP operating system. As a result of these DRM requirements the proposed BBC iPlayer download manager element therefore requires Windows Media Player 10 and Windows XP. This means the service would be unavailable to a minority of consumers who either do not use Microsoft or do not have an up-to-date Microsoft operating system. However, over time, technology improvements are likely to enable even more efficient methods of delivery. Further, it is our understanding the BBC Executive are working towards the iPlayer download manager being able to function on other operating systems.

and go on to say:

We also note that the Microsoft-based strategy for rights management will limit usage. Normally, we would expect BBC services to be universally available, as universal access to BBC services is in the public interest. However, as set out above, other mainstream technology platforms do not currently provide the appropriate security.

So the BBC Trust do want greater usage, but don’t seem to understand that DRM will stop that even if people do have access to Windows.

People may want to make their feelings known on this..

(Via Alec)

Why You Should Fear Microsoft

There’s been an ongoing discussion on the Beowulf list for Linux clusters about SGI and Windows clusters (which I’ve not had a chance to read), but as part of it the inimitable Robert G. Brown (or one of his AI bots that he must use to keep up his prolific and ever useful posting rate) wrote a lengthy and very interesting piece about why he is, and others should, be afraid of Microsoft’s dominance. It is written in response to a posting from a Microsoft employee, which in itself is an interesting turn up.

He makes lots of references to “hydraulic monopolies”, so it is worth reading up on hydraulic empires for some background to the historical context.

One point he makes is about their impact on pension funds:

Finally, there is Microsoft and pension plans and the general stock market. This is perhaps the scariest part of Microsoft’s supermonopoly status, one that a gentleman named Bill Parrish seems to have devoted himself to uncovering and laying bare to an obviously uncaring world. Microsoft stock is a rather huge component of stock owned by both pension plans and individual “S&P Index” investors (and individuals) all over the world. If Microsoft stock were to collapse, or even to slip steadily down in nominal value, the economic consequences would be catastrophic. It would make the collapse of Enron look tame by comparison, because Microsoft is considerably larger at baseline than Enron ever was. This creates a HUGE disincentive for individuals and companies to challenge Microsoft’s hydraulic legacy — Microsoft has essentially tied the future well being and wealth of an entire generation of corporate employees and index fund investors to their own continued success.

Here he is using an essay by the afforementioned Bill Parish which was done as an editorial for Barrons (from the WSJ people) in 2003 and available online, where Mr Parish writes:

For anyone owning a S&P 500 index fund, Microsoft automatically was almost 4% of their investment. Microsoft’s stock has since declined 58.5%, from $58.38 a share on Dec. 31, 1999,(adjusted for a subsequent split) to $24.21 on March 31. That’s a loss of more than $363 billion, an amount exceeding the gross national product of all but a few nations. The loss also happens to be almost five times the total market value of Enron at its peak.

For reference, MSFT are currently trading at US$30.74 and a market capitalisation of US$302.19 billion. That’s about twice the GDP of Ireland and half the GDP of Australia.

Rob has kindly granted permission for its reproduction here, but he retains copyright.

Continue reading

Vacation 1.2.6.3 released

This is a minor bugfix release to the 1.2.6 series of Vacation inspired by looking at the sorts of things Linux distros patch for their own usage.

Vacation no longer builds as -m486 by default, though it will build as 32-bit on AMD64/EM64T because GDBM is not 32/64-bit portable and trying to run a 64-bit version against a 32-bit created GDBM causes it to fail and syslog a success message. This is sub-optimal.

The Makefiles CFLAGS handling has been tidied up a fair bit as a consequence and will hopefully make life a little easier for distributors and it no longer tries to strip the vaclook Perl script on install, which was very silly.

Vacation also now accepts the -i option as well as -I to initialise its database.

Download from SourceForge here.

Continue reading