O que aconteceu com o bzip1?

40

O bzip2 tem sido um padrão de fato para uma compressão bastante strong ao longo de muitos anos. Eu mesmo tinha digitado o comando bzip2 milhares de vezes até agora, o que me faz pensar - o que aconteceu com o bzip ou bzip1? O Google não parece me dizer muito sobre isso e parece que pode ser uma interessante lição de história.

    
por d33tah 21.04.2014 / 13:35

1 resposta

32

Parece que o original bzip foi retirado por volta de 1998 devido a problemas de patentes com a compressão aritmética usada. Um pouco de escavação (realmente apenas lendo a Wikipedia) mostra um link arquivado para o bzip2 website desta época .

Aqui está a seção relevante detalha esta e outras diferenças:

How does it relate to your previous offering (bzip-0.21) ?

bzip2 is a rewritten and re-engineered version of 0.21. It looks superficially fairly similar, but has been almost entirely re-written (several times :-). The important differences are:

  • Patent-free! (I hope; see statement above). bzip-0.21 used arithmetic coding; bzip2 uses Huffman coding, which is generally regarded as non-problematic from a patent standpoint. Both programs are based on the Burrows-Wheeler transform, but, to the best of my knowledge, that's not patented either.

  • Faster, particularly at decompression. bzip2 decompresses more than 50% faster than 0.21, mostly because of the use of Huffman coding. I've also improved the compression speed, although not that much -- perhaps it compresses 30% faster than 0.21.

  • Recovery from media errors. Both programs compress data in blocks, by default, 900k long. With bzip2, each block is handled completely independently, carries its own checksum, and is delimited by a 48-bit sequence. So, if you have a damaged compressed file, bzip2 can extract the compressed blocks, detect which ones are undamaged, and decompress those.

  • Test mode. You can test integrity of compressed files without having to decompress them. I should have put this in 0.21, really, but was too lazy (+ burnt-out with hacking by the time I released it).

  • Handles very repetitive files much better. Such files are a worst-case for any block-sorting compressor. bzip2 runs approximately ten times faster than 0.21 for such files.

  • Support for smaller machines. bzip2 can decompress any file it creates in 2300k, which means you can decompress files on 4-meg machines. Peak memory use during compression is also reduced by about 900k compared with 0.21, to around 6400k.

  • Better flag handling. In particular, long flags (--like --this) are supported, which makes it easier to use.

  • The one-line startup message which 0.21 printed, is gone. This was 0.21's most complained-about feature. It even bugs me nowadays.

I'm no longer distributing 0.21, because doing so perpetuates problems with patents, which ensures that the program will never be widely used. That's a shame, because it's a useful program, and lots of people seem to like it. If you use 0.21 already, please upgrade to bzip2. I can't, unfortunately, make bzip2 be able to decompress 0.21's .bz files, since that would render the patent-avoidance exercise pointless. I know changing file formats is painful; from now on, I'll try and make any further changes in a backwards compatible way.

O também é um link para uma versão somente descompressiva do código-fonte bzip para qualquer pessoa que queira brincar com ele.

    
por 21.04.2014 / 14:12