EmbeddedRelated.com
Forums

Bitstream compression

Started by Rob Gaddi July 28, 2011
On 7/29/2011 7:36 AM, Noob wrote:
> [ NB: I've added comp.compression to the mix ] > > Jason wrote: > >> Rob Gaddi wrote: >> >>> Just did, on an FPGA with an admittedly small fill ratio. The >>> uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx >>> bitstream compression gets it down to 814,336 bits, or about 100kB. >>> 7-Zip knocked it down to 16kB. >>> >>> Another design uses a decent about of an XC6S45. Native size is >>> 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip >>> gives me 395kB. >>> >>> I've got a project coming up around an Altera Arria II, with 30Mb of >>> configuration. If I could get a 3:1, 4:1 compression ratio it would be >>> a pretty radical savings on flash size. >> >> The algorithm that 7-Zip uses internally for .7z file compression is >> LZMA: >> >> http://en.wikipedia.org/wiki/Lzma >> >> One characteristic of LZMA is that its decoder is much simpler than >> the encoder, making it well-suited for your application. You would be >> most interested in the open-source LZMA SDK: >> >> http://www.7-zip.org/sdk.html >> >> They provide an ANSI C implementation of a decoder that you can port >> to your platform. Also, there is a reference application for an >> encoder (I believe also written in C). I have used this in the past to >> compress firmware for embedded systems with good success. I use the >> encoder as a post-build step to compress the firmware image into an >> LZMA stream (note that the compression is not done with 7-Zip, as the . >> 7z format is a full-blown archive; the reference encoder just gives >> you a stream of compressed data, which is what you want). The >> resulting file is then decompressed on the embedded target at firmware >> update time. The decoder source code is most amenable to porting to 32- >> bit architectures; I have implemented it on the LPC2138 ARM7 device >> (with the same 32 KB of RAM as your part) as well as a few AVR32UC3 >> parts. >> >> A couple other things: I originally did this ~4 years ago with a much >> older version of the SDK; it's possible that things have changed since >> then, but it should still be worth a look. LZMA provides good >> compression ratios with a decoder that in my experience runs well on >> embedded platforms. Secondly, you do have to be careful with the >> parameters you use at the encoder if you want to bound the amount of >> memory required at the decoder. More specifically, you need to be >> careful which "dictionary size" you use for the encoder. As you might >> expect, a larger dictionary gives you better compression ratios, but >> the target running the decoder will require at least that much memory >> (e.g. an 8 KB dictionary size will require at least 8 KB of memory for >> the decoder). > > Lempel-Ziv-Oberhumer (LZO) might also be worth investigating. > > http://en.wikipedia.org/wiki/Lempel�Ziv�Oberhumer > > The LZO library implements a number of algorithms with the > following characteristics: > * Compression is comparable in speed to deflate compression. > * Very fast decompression > * Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level). > * Requires no additional memory for decompression other than the source and destination buffers. > * Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression. > > Regards.
Got the following in an email from Magnus Eriksson, who pled lack of Usenet access: -------------- A bunch of years ago I read about something called "pucrunch", for compressing programs for use on the C64, or in modern parlance perhaps "a CPU and memory restricted target system" -- it has a 1 MHz CPU with few registers, and 64K RAM in total. Pucrunch uses LZ77 and run-length encoding, with some tweaks; that is what the author decided was the right compromise between compression ratio and memory usage. And it is a "cruncher", which means that the end result is a self-decompressing binary, one that unpacks a program to RAM, and the decompressing code has to stay out of its way as far as possible -- I believe the generic 6502 decompression routine takes up only _354 bytes_ of code, if I'm reading the code comments correctly. The catch is that you'll have to write your own decompression routine (but that should hardly come as a surprise). There is pseudocode in the article, and 6502 and Z80 code linked. The compressor is just one file of C, so it should be easy to test the compression ratio first, at least. It will almost certainly be worse than 7-zip, but just how well it does on a bitstream might be interesting to see. You can find it here: Article: An Optimizing Hybrid LZ77 RLE Data Compression Program, aka Improving Compression Ratio for Low-Resource Decompression http://www.cs.tut.fi/~albert/Dev/pucrunch/ some related things in the parent directory too: http://www.cs.tut.fi/~albert/Dev/ Hope that might be of some help, or inspiration to further experimentation. Take care, Magnus
On Fri, 29 Jul 2011 09:21:45 -0500, Vladimir Vassilevsky
<nospam@nowhere.com> wrote:

> > >Rob Gaddi wrote: > >> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote: >>> Rob Gaddi wrote: >>> >>>> So my experience with the native bitstream compression algorithms >>>> provided by the FPGA vendors has been that they don't actually achieve >>>> all that much compression. >>> >>> It won't be generally possible to beat vendor provided basic compression >>> more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't >>> really improve anything. >>> >> Native size is >> 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip >> gives me 395kB. > >Interesting. > >I tried to compress JBCs from heavily used Altera Cyclone, got only >about x1.5 of compression. > >As for compression algorithm, something like a bit oriented LZSS or LZW >would be easy to implement. The decompression part is trivial. If you >don't care about speed, the compression part is very simple, too. I >don't know if the further sophistication of the algorithm would do much >of a difference. > >Vladimir Vassilevsky >DSP and Mixed Signal Design Consultant >http://www.abvolt.com
I've done a very simple byte-oriented RLL thing on Xilinx chips and got config files that were 20 to 40% as big as the original binaries. Even a pretty-full chip has long runs of 0x00 and 0xFF bytes in the bitstream. The uP code to decompress this and bang the FPGA is pretty simple. In serial bit-bang mode, decompressing and configuring is faster than uncompressed, because spitting out a long string of identical bits can be done as a fast special case... just wiggle the config clock! John