The Musings of Chris Samuel

The Musings of Chris Samuel

The Thoughts and Feelings of a Melbourne Person

The Musings of Chris Samuel RSS Feed
 
 
 
 

SMP implementation of bzip2

Here’s something of a find, courtesy of Jordan Mendler on the ZFS/FUSE mailing list, an SMP implementation of bzip2 called pbzip2 by Jeff Gilchrist:

PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2). PBZIP2 should work on any system that has a pthreads compatible C++ compiler (such as gcc). It has been tested on: Linux, Windows (cygwin & MinGW), Solaris, Tru64/OSF1, HP-UX, and Irix.

It’s packaged in Ubuntu (in Universe) and testing on this quad core Intel box (2.4GHz with 4GB RAM) on a 712MB tar file in comparison with the standard bzip2 showed pretty impressive performance!

Standard bzip2 compression:

chris@quad:/tmp$ time bzip2 -v backup-20020122.tar
backup-20020122.tar: 1.531:1, 5.227 bits/byte, 34.66% saved, 746250240 in, 487572628 out.

real 2m32.331s
user 2m29.593s
sys 0m0.976s

Standard bzip2 decompression:

chris@quad:/tmp$ time bunzip2 -v backup-20020122.tar.bz2
backup-20020122.tar.bz2: done

real 0m56.215s
user 0m54.519s
sys 0m1.136s

Parallel bzip2 compression:

chris@quad:/tmp$ time pbzip2 -v backup-20020122.tar
Parallel BZIP2 v1.0.1 - by: Jeff Gilchrist [http://compression.ca]
[Mar. 20, 2007] (uses libbzip2 by Julian Seward)

# CPUs: 4
BWT Block Size: 900k
File Block Size: 900k
——————————————-
File #: 1 of 1
Input Name: backup-20020122.tar
Output Name: backup-20020122.tar.bz2

Input Size: 746250240 bytes
Compressing data…
Output Size: 487531723 bytes
——————————————-

Wall Clock: 41.335455 seconds

real 0m41.338s
user 2m40.962s
sys 0m2.248s

Parallel bzip2 decompression:

time pbzip2 -v -d backup-20020122.tar.bz2
Parallel BZIP2 v1.0.1 - by: Jeff Gilchrist [http://compression.ca]
[Mar. 20, 2007] (uses libbzip2 by Julian Seward)

# CPUs: 4
——————————————-
File #: 1 of 1
Input Name: backup-20020122.tar.bz2
Output Name: backup-20020122.tar

BWT Block Size: 900k
Input Size: 487531723 bytes
Decompressing data…
Output Size: 746250240 bytes
——————————————-

Wall Clock: 18.078961 seconds

real 0m18.081s
user 1m3.516s
sys 0m1.776s

So that’s almost a x3.7 speedup over the single CPU version, not bad!

Oh, and yes, there is an MPI version available too, called mpibzip2.. :-)

RSS 2.0 feed • Leave a response, or trackback

5 Responses to “SMP implementation of bzip2”

  1. 1
    Jeff Waugh:

    Nice find!

    I’ve enjoyed reading your ZFS/FUSE posts too — thought about offering an article to LWN, given that you’ve written about it, and one of your posts was worthy of LWN linkage? ;-)

  2. 2
    Tim Freeman:

    Cool. I looked for a gzip equivalent and found PIGZ which was released this year but did not try to compile and run it yet. Search for: pigz gzip

  3. 3
    Chris Samuel:

    @Jdub: Not considered it, to be honest I don’t know if my knowledge is up to it!

    @Tim: Neat - just gave it a go with that same tar file, the improvement isn’t as marked as the bzip2 one though! Standard gzip = 74s whilst pigz with 4 threads did it in 41s (and the default 32 threads takes longer, no surprise really).

  4. 4
    Chris Samuel:

    @Tim: forgot to add the link I used to get pigz, it was http://www.c10n.info/archives/505

  5. 5
    dropsafe : links for 2008-01-10:

    [...] SMP implementation of bzip2 at The Musings of Chris Samuel PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbz (tags: compression threads) [...]

Leave a Reply

Comments for this post will be closed on 2 January 2009.

Related Posts

Music

Spam Blocked

RSS ABC News

Blogroll

People

Sites

Recent Posts

Random Image

 
dsc_0051.jpg
 

Categories

Archives

Meta

Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Australia
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Australia