EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Bitstream compression

Started by Rob Gaddi July 28, 2011
Hey all--

So my experience with the native bitstream compression algorithms 
provided by the FPGA vendors has been that they don't actually achieve 
all that much compression.  This makes sense given that the expansion 
has to be accomplished in the FPGA hardware, and so can't be all that 
aggressive.

I generally find myself programming FPGAs from microcontrollers, and so 
I've got clock cycles and RAM to throw at the problem to try to do 
better.  We've had some limited luck implementing really stupid RLE, 
just compressing the runs of 0x00 and 0xFF and calling it a day.  But I 
was wondering whether anyone's found a better, more universal approach.

The ideal would be a compression/expansion library with fairly good 
compression ratios, where the expansion side could operate without 
malloc and on-the-fly on a stream of data rather than requiring large 
buffers.  The compression side could use phenominal amounts of compute 
power and RAM, since it would be happening on the PC.  But the goal 
would be able to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

Rob Gaddi wrote:

> Hey all-- > > So my experience with the native bitstream compression algorithms > provided by the FPGA vendors has been that they don't actually achieve > all that much compression.
> The ideal would be a compression/expansion library with fairly good > compression ratios, where the expansion side could operate without > malloc and on-the-fly on a stream of data rather than requiring large > buffers.
It won't be generally possible to beat vendor provided basic compression more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't really improve anything.
> The compression side could use phenominal amounts of compute > power and RAM, since it would be happening on the PC. But the goal > would be able to decompress on something like an LPC1754 (32kB RAM total). > > Anyone have any thoughts?
Just try 7zip on the FPGA binaries and see for yourself. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
> > > Rob Gaddi wrote: > >> Hey all-- >> >> So my experience with the native bitstream compression algorithms >> provided by the FPGA vendors has been that they don't actually achieve >> all that much compression. > >> The ideal would be a compression/expansion library with fairly good >> compression ratios, where the expansion side could operate without >> malloc and on-the-fly on a stream of data rather than requiring large >> buffers. > > It won't be generally possible to beat vendor provided basic compression > more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't > really improve anything. > >> The compression side could use phenominal amounts of compute power and >> RAM, since it would be happening on the PC. But the goal would be able >> to decompress on something like an LPC1754 (32kB RAM total). >> >> Anyone have any thoughts? > > Just try 7zip on the FPGA binaries and see for yourself. > > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultant > http://www.abvolt.com
Just did, on an FPGA with an admittedly small fill ratio. The uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx bitstream compression gets it down to 814,336 bits, or about 100kB. 7-Zip knocked it down to 16kB. Another design uses a decent about of an XC6S45. Native size is 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip gives me 395kB. I've got a project coming up around an Altera Arria II, with 30Mb of configuration. If I could get a 3:1, 4:1 compression ratio it would be a pretty radical savings on flash size. -- Rob Gaddi, Highland Technology Email address is currently out of order
On Thu, 28 Jul 2011 17:40:25 -0700, Rob Gaddi wrote:

> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote: >> >> >> Rob Gaddi wrote: >> >>> Hey all-- >>> >>> So my experience with the native bitstream compression algorithms >>> provided by the FPGA vendors has been that they don't actually achieve >>> all that much compression. >> >>> The ideal would be a compression/expansion library with fairly good >>> compression ratios, where the expansion side could operate without >>> malloc and on-the-fly on a stream of data rather than requiring large >>> buffers. >> >> It won't be generally possible to beat vendor provided basic >> compression more then by a factor of ~1.5 or so. The gain of 1.5 times >> wouldn't really improve anything. >> >>> The compression side could use phenominal amounts of compute power and >>> RAM, since it would be happening on the PC. But the goal would be able >>> to decompress on something like an LPC1754 (32kB RAM total). >>> >>> Anyone have any thoughts? >> >> Just try 7zip on the FPGA binaries and see for yourself. >> >> >> Vladimir Vassilevsky DSP and Mixed Signal Design Consultant >> http://www.abvolt.com > > Just did, on an FPGA with an admittedly small fill ratio. The > uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx > bitstream compression gets it down to 814,336 bits, or about 100kB. > 7-Zip knocked it down to 16kB. > > Another design uses a decent about of an XC6S45. Native size is > 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip > gives me 395kB. > > I've got a project coming up around an Altera Arria II, with 30Mb of > configuration. If I could get a 3:1, 4:1 compression ratio it would be > a pretty radical savings on flash size.
A design done for a client, using Virtex 6, with gzip -9 for compression. Figures are from memory and are approximate. Uncompressed: 7.5MBytes compressed: a few 10s of kB (empty FPGA) compressed: 500kBytes (mostly empty FPGA) compressed: 2MBytes (mostly full FPGA) compressed: 7.5MBytes (with bitstream encryption) Note: there's no point using compression with encryption, as the encrypted images don't compress. I used gzip as the decompresser is open source and fairly "light". The Xilinx built-in compression doesn't do a good job because it merely joins identical frames. The chance of getting identical frames is low for any design that uses a reasonable amount of the fabric. (If you're not using most of the fabric, you could be using a smaller, cheaper device.) OTOH, if you are using bitstream encryption, the built-in compression is the only compression you can use and it will be better than nothing. Regards, Allan
From my memory, Altera has a much better "native" compression than
Xilinx, with typical compression-ratios of about 50%.

Regards,

Thomas

www.entner-electronics.com
On Jul 28, 8:40=A0pm, Rob Gaddi <rga...@technologyhighland.com> wrote:
> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote: > > > > > > > > > > > > > Rob Gaddi wrote: > > >> Hey all-- > > >> So my experience with the native bitstream compression algorithms > >> provided by the FPGA vendors has been that they don't actually achieve > >> all that much compression. > > >> The ideal would be a compression/expansion library with fairly good > >> compression ratios, where the expansion side could operate without > >> malloc and on-the-fly on a stream of data rather than requiring large > >> buffers. > > > It won't be generally possible to beat vendor provided basic compressio=
n
> > more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't > > really improve anything. > > >> The compression side could use phenominal amounts of compute power and > >> RAM, since it would be happening on the PC. But the goal would be able > >> to decompress on something like an LPC1754 (32kB RAM total). > > >> Anyone have any thoughts? > > > Just try 7zip on the FPGA binaries and see for yourself. > > > Vladimir Vassilevsky > > DSP and Mixed Signal Design Consultant > >http://www.abvolt.com > > Just did, on an FPGA with an admittedly small fill ratio. =A0The > uncompressed bitstream for an XC3S200 is 1,047,616 bits. =A0Using Xilinx > bitstream compression gets it down to 814,336 bits, or about 100kB. > 7-Zip knocked it down to 16kB. > > Another design uses a decent about of an XC6S45. =A0Native size is > 11,875,104 bits (~1.5MB). =A0Bitstream compresson gives me 1.35MB. =A07-Z=
ip
> gives me 395kB. > > I've got a project coming up around an Altera Arria II, with 30Mb of > configuration. =A0If I could get a 3:1, 4:1 compression ratio it would be > a pretty radical savings on flash size. > > -- > Rob Gaddi, Highland Technology > Email address is currently out of order
The algorithm that 7-Zip uses internally for .7z file compression is LZMA: http://en.wikipedia.org/wiki/Lzma One characteristic of LZMA is that its decoder is much simpler than the encoder, making it well-suited for your application. You would be most interested in the open-source LZMA SDK: http://www.7-zip.org/sdk.html They provide an ANSI C implementation of a decoder that you can port to your platform. Also, there is a reference application for an encoder (I believe also written in C). I have used this in the past to compress firmware for embedded systems with good success. I use the encoder as a post-build step to compress the firmware image into an LZMA stream (note that the compression is not done with 7-Zip, as the . 7z format is a full-blown archive; the reference encoder just gives you a stream of compressed data, which is what you want). The resulting file is then decompressed on the embedded target at firmware update time. The decoder source code is most amenable to porting to 32- bit architectures; I have implemented it on the LPC2138 ARM7 device (with the same 32 KB of RAM as your part) as well as a few AVR32UC3 parts. A couple other things: I originally did this ~4 years ago with a much older version of the SDK; it's possible that things have changed since then, but it should still be worth a look. LZMA provides good compression ratios with a decoder that in my experience runs well on embedded platforms. Secondly, you do have to be careful with the parameters you use at the encoder if you want to bound the amount of memory required at the decoder. More specifically, you need to be careful which "dictionary size" you use for the encoder. As you might expect, a larger dictionary gives you better compression ratios, but the target running the decoder will require at least that much memory (e.g. an 8 KB dictionary size will require at least 8 KB of memory for the decoder). Jason
Rob Gaddi <rgaddi@technologyhighland.com> wrote:

>Hey all-- > >So my experience with the native bitstream compression algorithms >provided by the FPGA vendors has been that they don't actually achieve >all that much compression. This makes sense given that the expansion >has to be accomplished in the FPGA hardware, and so can't be all that >aggressive. > >I generally find myself programming FPGAs from microcontrollers, and so >I've got clock cycles and RAM to throw at the problem to try to do >better. We've had some limited luck implementing really stupid RLE, >just compressing the runs of 0x00 and 0xFF and calling it a day. But I >was wondering whether anyone's found a better, more universal approach.
The problem is that Hufman based algorithms don't gain that much. I recall someone has had lots of succes using an RLE like scheme on nibbles (4 bit) instead of whole bytes. Look here: http://www.sulimma.de/prak/ws0001/projekte/ralph/Projekt/index.htm -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------

Rob Gaddi wrote:

> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote: >> Rob Gaddi wrote: >> >>> So my experience with the native bitstream compression algorithms >>> provided by the FPGA vendors has been that they don't actually achieve >>> all that much compression. >> >> It won't be generally possible to beat vendor provided basic compression >> more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't >> really improve anything. >> > Native size is > 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip > gives me 395kB.
Interesting. I tried to compress JBCs from heavily used Altera Cyclone, got only about x1.5 of compression. As for compression algorithm, something like a bit oriented LZSS or LZW would be easy to implement. The decompression part is trivial. If you don't care about speed, the compression part is very simple, too. I don't know if the further sophistication of the algorithm would do much of a difference. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
[ NB: I've added comp.compression to the mix ]

Jason wrote:

> Rob Gaddi wrote: > >> Just did, on an FPGA with an admittedly small fill ratio. The >> uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx >> bitstream compression gets it down to 814,336 bits, or about 100kB. >> 7-Zip knocked it down to 16kB. >> >> Another design uses a decent about of an XC6S45. Native size is >> 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip >> gives me 395kB. >> >> I've got a project coming up around an Altera Arria II, with 30Mb of >> configuration. If I could get a 3:1, 4:1 compression ratio it would be >> a pretty radical savings on flash size. > > The algorithm that 7-Zip uses internally for .7z file compression is > LZMA: > > http://en.wikipedia.org/wiki/Lzma > > One characteristic of LZMA is that its decoder is much simpler than > the encoder, making it well-suited for your application. You would be > most interested in the open-source LZMA SDK: > > http://www.7-zip.org/sdk.html > > They provide an ANSI C implementation of a decoder that you can port > to your platform. Also, there is a reference application for an > encoder (I believe also written in C). I have used this in the past to > compress firmware for embedded systems with good success. I use the > encoder as a post-build step to compress the firmware image into an > LZMA stream (note that the compression is not done with 7-Zip, as the . > 7z format is a full-blown archive; the reference encoder just gives > you a stream of compressed data, which is what you want). The > resulting file is then decompressed on the embedded target at firmware > update time. The decoder source code is most amenable to porting to 32- > bit architectures; I have implemented it on the LPC2138 ARM7 device > (with the same 32 KB of RAM as your part) as well as a few AVR32UC3 > parts. > > A couple other things: I originally did this ~4 years ago with a much > older version of the SDK; it's possible that things have changed since > then, but it should still be worth a look. LZMA provides good > compression ratios with a decoder that in my experience runs well on > embedded platforms. Secondly, you do have to be careful with the > parameters you use at the encoder if you want to bound the amount of > memory required at the decoder. More specifically, you need to be > careful which "dictionary size" you use for the encoder. As you might > expect, a larger dictionary gives you better compression ratios, but > the target running the decoder will require at least that much memory > (e.g. an 8 KB dictionary size will require at least 8 KB of memory for > the decoder).
Lempel-Ziv-Oberhumer (LZO) might also be worth investigating. http://en.wikipedia.org/wiki/Lempel&#4294967295;Ziv&#4294967295;Oberhumer The LZO library implements a number of algorithms with the following characteristics: * Compression is comparable in speed to deflate compression. * Very fast decompression * Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level). * Requires no additional memory for decompression other than the source and destination buffers. * Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression. Regards.
[ Dropping comp.arch.fpga ]
Noob wrote:
> [ NB: I've added comp.compression to the mix ] > > Jason wrote: > >> Rob Gaddi wrote: >> >>> Just did, on an FPGA with an admittedly small fill ratio. The >>> uncompressed bitstream for an XC3S200 is 1,047,616 bits. Using Xilinx >>> bitstream compression gets it down to 814,336 bits, or about 100kB. >>> 7-Zip knocked it down to 16kB. >>> >>> Another design uses a decent about of an XC6S45. Native size is >>> 11,875,104 bits (~1.5MB). Bitstream compresson gives me 1.35MB. 7-Zip >>> gives me 395kB. >>> >>> I've got a project coming up around an Altera Arria II, with 30Mb of >>> configuration. If I could get a 3:1, 4:1 compression ratio it would be >>> a pretty radical savings on flash size. >> >> The algorithm that 7-Zip uses internally for .7z file compression is >> LZMA: >> >> http://en.wikipedia.org/wiki/Lzma >> >> One characteristic of LZMA is that its decoder is much simpler than >> the encoder, making it well-suited for your application. You would be >> most interested in the open-source LZMA SDK: >> >> http://www.7-zip.org/sdk.html >> >> They provide an ANSI C implementation of a decoder that you can port >> to your platform. Also, there is a reference application for an >> encoder (I believe also written in C). I have used this in the past to >> compress firmware for embedded systems with good success. I use the >> encoder as a post-build step to compress the firmware image into an >> LZMA stream (note that the compression is not done with 7-Zip, as the . >> 7z format is a full-blown archive; the reference encoder just gives >> you a stream of compressed data, which is what you want). The >> resulting file is then decompressed on the embedded target at firmware >> update time. The decoder source code is most amenable to porting to 32- >> bit architectures; I have implemented it on the LPC2138 ARM7 device >> (with the same 32 KB of RAM as your part) as well as a few AVR32UC3 >> parts. >> >> A couple other things: I originally did this ~4 years ago with a much >> older version of the SDK; it's possible that things have changed since >> then, but it should still be worth a look. LZMA provides good >> compression ratios with a decoder that in my experience runs well on >> embedded platforms. Secondly, you do have to be careful with the >> parameters you use at the encoder if you want to bound the amount of >> memory required at the decoder. More specifically, you need to be >> careful which "dictionary size" you use for the encoder. As you might >> expect, a larger dictionary gives you better compression ratios, but >> the target running the decoder will require at least that much memory >> (e.g. an 8 KB dictionary size will require at least 8 KB of memory for >> the decoder). > > Lempel-Ziv-Oberhumer (LZO) might also be worth investigating. > > http://en.wikipedia.org/wiki/Lempel&#4294967295;Ziv&#4294967295;Oberhumer > > The LZO library implements a number of algorithms with the > following characteristics: > * Compression is comparable in speed to deflate compression. > * Very fast decompression > * Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level). > * Requires no additional memory for decompression other than the source and destination buffers. > * Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.
Some of the following benchmarks may be of interest. http://www.maximumcompression.com/ http://www.maximumcompression.com/data/exe.php http://www.maximumcompression.com/data/summary_mf4.php Regards.

The 2024 Embedded Online Conference