CyberArchaeology in Afghanistan

Posted by Chris Samuel on Jul 25th, 2008
2008
Jul 25

David Thomas at La Trobe University here in Melbourne has been using Google Earth to do archaeological research in Afghanistan. An excellent idea given the inaccessibility at the moment, and something that could also be useful in other areas like Iraq.

Google to host Open Source scientific data sets

Posted by Chris Samuel on Jan 20th, 2008
2008
Jan 20

Now this sounds really interesting..

Sources at Google have disclosed that the humble domain, http://research.google.com, will soon provide a home for terabytes of open-source scientific datasets. The storage will be free to scientists and access to the data will be free for all.

They may also provide data viz tools..

Building on the company’s acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information.

There is more information (including about why Google intend to import data by shipping RAID arrays around the world) here and (more up to date) here.

We live in exciting times!

Google Code Search

Posted by Chris Samuel on Jan 13th, 2008
2008
Jan 13

If you’re ever looking around for a piece of code to do something, then you should try Google’s Code Search.

For example, say I’m looking for some C code to parse RFC 2822 mail headers (which, strangely enough, I am). I go to codesearch and put in a search term of lang:c rfc2822

That gives me back a bunch of results, but say I want to look for something with a BSD license to use with Vacation, then I just extend that search with a license:bsd term, which gives me the great news that SMail (which I used to run 13-14 years ago now) has a librfc2822 directory, which deserves further investigation!

Google Toilet ISP (beta) (Updated)

Posted by Chris Samuel on Apr 1st, 2007
2007
Apr 1

Chalk up another great Google April Fool.. :-)

Google TiSP (BETA) is a fully functional, end-to-end system that provides in-home wireless access by connecting your commode-based TiSP wireless router to one of thousands of TiSP Access Nodes via fiber-optic cable strung through your local municipal sewage lines.

Google Toilet ISP - Going with the Flow

Google - turning the fear that the Internet is a sewer into reality. :-)

Update: Google have another, which is Google Paper. This one found via Wikipedia’s list of April Fools for 2007.

Google Paper on Hard Disk Failures (Updated)

Posted by Chris Samuel on Feb 17th, 2007
2007
Feb 17

Eugen Leitl posted an interesting paper from Google to the Beowulf list, Failure Trends in a Large Disk Drive Population (PDF), where “large” is in excess of 100,000 drives. The paper abstract says:

Our analysis identifies several parameters from the drive’s self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.

Some of the Beowulfers have come up with constructive criticism of the paper, including interesting comment from rgb:

How did they look for predictive models on the SMART data? It sounds like they did a fairly linear data decomposition, looking for first order correlations. Did they try to e.g. build a neural network on it, or use fully multivariate methods (ordinary stats can handle it up to 5-10 variables).

and from Mark Hahn:

funny, when I saw figure5, I thought the temperature effect was pretty dramatic. in fact, all the metrics paint a pretty clear picture of infant mortality, then reasonably fit drives suriving their expected operational life (3 years). in senescence, all forms of stress correlate with increased failure. I have to believe that the 4/5th year decreases in AFR are either due to survival effects or sampling bias.

It will be interesting to see if they take notice of this open source peer review as there is at least one person from Google on the list.

Update: There is also a Usenix paper on hard disk failures that looks at different hard disc types.

ZFS Disk Mirroring, Striping and RAID-Z

Posted by Chris Samuel on Jan 1st, 2007
2007
Jan 1

This is the third in a series of tests1, but this time we’re going to test out how it handles multiple drives natively, rather than running over an existing software RAID+LVM setup. ZFS has the ability to dynamically add disks to a pool for striping (the default) mirroring or RAID-Z (with single or double parity) which are designed to improve speed (with striping), reliability (with mirroring) and performance and reliability (with RAID-Z).

Continue Reading »


  1. the previous ones are ZFS on Linux Works! and ZFS versus XFS with Bonnie++ patched to use random data [back]

ZFS versus XFS with Bonnie++ patched to use random data

Posted by Chris Samuel on Jan 1st, 2007
2007
Jan 1

I’ve patched Bonnie++1 to use a block of data from /dev/urandom instead of all 0’s for its block write tests. The intention is to see how the file systems react to less predictable data and to remove the unfair advantage that ZFS has with compression2.

Continue Reading »


  1. it’s not ready for production use as it isn’t controlled by a command line switch and relies on /dev/urandom existing [back]
  2. yes, I’m going to send the patch to Russell to look at [back]

ZFS on Linux Works! (Update 3)

Posted by Chris Samuel on Dec 30th, 2006
2006
Dec 30

Here’s my quick experience trying out the ZFS alpha release with write support. First I built and installed ZFS and then ran the run.sh script to run the FUSE process that the kernel will use to provide the ZFS file system. Then the fun really begins.

Continue Reading »

2006
Dec 30

Ricardo Correia has announced on his blog about porting Sun Solaris’s ZFS to Linux using FUSE that he has an alpha release with working write support out:

Performance sucks right now, but should improve before 0.4.0 final, when a multi-threaded event loop and kernel caching support are working (both of these should be easy to implement, FUSE provides the kernel caching).

He might be being a little modest about performance, one commenter (Stan) wrote:

Awesome! I compared a zpool with a single file (rather than a partition) compared to ext2 on loopback to a single file. With bonnie++, I was impressed to see the performance of zfs-fuse was only 10-20% slower than ext2.

Stan then went and tried another interesting test:

For fun, check out what happens when you turn compression on and run bonnie++. The bonnie++ test files compress 28x, and the read and write rates quadruple! It’s not a realistic scenario, but interesting to see.

Ricardos list of what should be working in this release is pretty impressive:

  • Creation, modification and destruction of ZFS pools, filesystems, snapshots and clones.
  • Dynamic striping (RAID-0), mirroring (RAID-1), RAID-Z and RAID-Z2.
  • It supports any vdev configuration which is supported by the original Solaris implementation.
  • You can use any block device or file as a vdev (except files stored inside ZFS itself).
  • Compression, checksumming, error detection, self-healing (on redundant pools).
  • Quotas and reservations.

Read his STATUS file to find out what isn’t working too (the main one there I spotted was zfs send and recv).

Caveat: this is an alpha release, so it might eat your data.

Now Using Google’s tcmalloc

Posted by Chris Samuel on Dec 15th, 2006
2006
Dec 15

As another performance experiment (not content with just PHP 5.2.0) I’ve now got Apache (and hence mod-fcgid and PHP) using tcmalloc from Google’s PerfTools through the simple expedient of adding:

export LD_PRELOAD=/usr/local/google-perftools/lib/libtcmalloc.so

to /etc/default/apache2. Again, any problems, let me know!

Thanks to Mikal for the, ahem, pointer. :-)

Next »