bzip2 is a freely available, patent free (see below), high-quality data
compressor. It typically compresses files to within 10% to 15% of the best
available techniques (the PPM family of statistical compressors), whilst
being around twice as fast at compression and six times faster at decompression.
Why would I want to use it?
Because it compresses well. So it packs more stuff into your overfull disk
drives, distribution CDs, floppy disks, Zip disks, backup tapes, ... whatever.
And/or it reduces your phone bills, customer download times, long distance
network traffic, ... whatever. Pretty obvious really. Who's arguing? It's
not the world's fastest compressor, but it's still fast enough to be plenty
useful.
Because it's open-source (BSD-style license), and, as far as I know, patent-free.
(To the best of my knowledge. I can't afford to do a full patent search,
so I can't guarantee this. Caveat emptor). So you can use it for whatever
you like. Naturally, the source code is part of the distribution.
Because it supports (limited) recovery from media errors. If you are trying
to restore compressed data from a backup tape or disk, and that data contains
some errors, bzip2 may still be able to decompress those parts of the file
which are undamaged.
Because you already know how to use it. bzip2's command line flags are
similar to those of GNU Gzip, so if you know how to use gzip, you know
how to use bzip2.
Because it's very portable. It should run on any 32 or 64-bit machine with
an ANSI C compiler. The distribution should compile unmodified on Unix
and Win32 systems. Earlier versions have been ported with little
difficulty to a large number of weird and wonderful systems.
Because the documentation tells you how and to what extent I've tested
it, and you can decide for yourself whether or not to entrust your data
to it. For 1.0.0, the test volume is about 6 gigabytes in circa 120,000
files.
The code is organised as a library, with a programming interface.
The bzip2 program itself is a client of the library. You can use
the library in your own programs, to directly read and write .bz2 files,
or even just to compress data in memory using the bzip2 algorithms.
Getting the latest version: bzip2-1.0.1
See below for what's new in 1.0.0. 1.0.0
is an improvement over 0.1pl2, 0.9.0 and 0.9.5, but the file format is
unchanged, so the four versions should interwork fine. 1.0.1 is identical
to 1.0.0, except that a couple of obscure build problems on Windows platforms
have been fixed, and there are some minor documentation updates.
If you have a working 1.0.0, upgrading to 1.0.1 is not necessary.
Executables
First off, here are some executables I've collected. I hope to expand
this list over time. Because 1.0.0 is pretty new, this list
is very small. If your system isn't listed, there may be an older
version available: see the next section. As with previous releases,
I will expand this list as people donate executables for other systems.
Please read the notes on executables before
downloading. You might avoid some common problems.
There's increasing demand for the library as a DLL (Win32) or as Unix dynamic
shared objects (.so's). Here are some. Once again, please
read the notes on executables before downloading.
Linux users, you first need to find out which libc version you have, by
doing 'ls /lib/*libc-*'.
Sun Sparc, Solaris. A shared library is supplied as standard with
Solaris 8. For earlier versions of Solaris, you might want to try
http://www.sunfreeware.com, or
you might be able to build a .so without much difficulty from the sources.
If you can be bothered, please email
me to say you've got a copy. It's nice to know where this
stuff gets to.
Getting an older version: bzip2-0.9.5d or bzip2-0.9.0c
Although older, these versions should work fine, unless you need large
(> 2GB) file support. Please read the notes
on executables before downloading. You might avoid some common
problems.
PC,
Linux 2.0.36 statically linked to avoid current Linux libc version
insanity.
If your machine isn't listed here, don't despair. bzip2 is very portable.
It should run on practically any 32 or 64 bit computer, if you have enough
spare memory (at least 8 megabytes). If you have an ANSI C compiler,
you have a very good chance of building a working executable from the sources
with minimal difficulty.
TO USE: Rename the file you've got to plain "bzip2" (or "bzip2.exe",
on Win95/98/NT/2000), and use it.
To decompress a .bz2 file, do "bzip2 -d my_file.bz2". Remember,
the one program does both compression and decompression. To get decompression
by default, copy "bzip2.exe" to "bunzip2.exe" (Win95/98/NT/2000), or symlink
"bzip2" to "bunzip2" (Unix users).
Some notes on executables:
If Netscape tries to display the executable as text rather than saving
it to disk, try cancelling the operation. Instead, do shift-Click,
or right-click on the link to get a menu. Similar tricks (a right-click?)
will probably get you a menu in Internet Explorer, with which to save the
file.
I hope that these executables work correctly and don't do nasty things,
but can't guarantee that, since I have no way to test most of them.
If you're as paranoid as I am, and want to use bzip2 to compress Extremely
Important Data, you might want to build it from the source code.
It's really very easy. That way you get a self-test of the program,
which might catch unforseen nasties on obscure machine/OS combinations.
Here's the Unix
man page, so you can see properly how to use it. For full documentation,
download the source bundle.
Documentation
Here's the HTML
version of the complete manual, unfortunately lacking the license page
due to some oddity of texi2html. And here's the postscript.
Many people have asked about Y2K issues in bzip2. Here's
a short statement.
What's new in 1.0.0 ?
Support for large files (> 2 GB) on OSs that support it. Seems to
work for Solaris 7, Tru64 (nee Digital) Unix 5.0, HP/UX 10.20 and 11.00,
Cygwin B20.1 on Windows 2000, and natively (MS VC 6.0) on Windows NT.
Faster compression: 10%-25% faster than 0.9.5. As ever, your mileage
may vary.
Much better robustness to corrupted compressed data -- mainly of interest
if you use the library.
Minor portability enhancements: now builds out of the box on Cygwin, as
well as Unixes and Win32.
A couple of minor bugs in file handling have been fixed.
Can be built as a shared library, at least on x86-Linux.
The CHANGES files gives more details.
What's new in 0.9.5 ?
Not many big changes. Mostly a slow evolution of 0.9.0 into something
more robust. Still, you should try and move to 0.9.5 as and when
you can.
Compression speed is much less sensitive to the input data than in previous
versions. Specifically, the very slow performance caused by repetitive
data is fixed.
Many small improvements in file and flag handling.
More portable Makefile, hopefully.
A Y2K statement.
What's new in 0.9.0 ?
0.9.0 is the first public version since 0.1pl2. The central feature
of 0.9.0 is that the code has been completely reorganised, so that the
main compress/decompress machinery is in a library. The bzip2 program
is now merely a wrapper on top of the library. I've also incorporated
various small speedups, functionality enhancements and portability things
-- mostly stuff that was frequently requested in your feedback.
Note that the .bz2 file format is unchanged, so 0.9.0 is fully forwards
and backwards compatible with 0.1pl2.
Specific changes:
A library interface, so your programs can read/write .bz2 files directly.
Compilation as a Windows DLL, and in a stdio.h-free environment for embedded
applications, is supported.
Speedups: 10% faster compression, 30% faster decompression. Your
mileage may vary :)
More flexible licensing (BSD style license), to allow the possibility of
commercial use of the software.
Support for concatenated compressed files. A succession of concatenated
.bz2 files can be correctly decompressed to yield the concatenation of
the originals.
Further portability enhancements.
Better documentation. There's now a full user manual, in Postscript
and HTML form.
Contributed stuff
A patch
for GNU tar 1.13 so you can make it compress with bzip2. The
relevant flags are -y or --bzip2 or --bunzip2. From Kevin Ivory and
David Fetter and modified by Thomas Bucholz. Several other people
also sent patches; thank you for them.
I'm an (experimental) compiler-writer by trade. At the moment I work
as a research assistant for Glasgow University, helping develop a
compiler for the functional language Haskell. The Glasgow Haskell
compiler serves as a testbed for research into Haskell,
and at the same time is a stable, well regarded, freely available, state
of the art optimising compiler for Haskell. It's available for most
major platforms. Perhaps you'd care to give it a spin. We're
close to releasing version 4.07 of our compiler and supporting tools.
It's open source. Naturally.
In the more distant past, I worked for five years on parallelising compilers
for functional languages at the University
of Manchester, UK. I'm a big fan of Haskell,
an elegant and useful functional language. Getting a bit bored with
C? Try doing some lazy functional programming in Haskell. It'll
change the way you think about programming. Permanently.
Memory effects have a big effect on the performance of programs -- especially
bzip2. I tried and failed to find a decent, open-source tool which
would tell me exactly which lines of code produce cache misses, and in
the end I wrote my own. It's a useful performance analysis tool,
and I think it totally Kicks Ass. Your opinion may differ.
In any case, you can get it from http://www.cacheprof.org.