Bitstream compression

Hey all--

So my experience with the native bitstream compression algorithms 
provided by the FPGA vendors has been that they don't actually achieve 
all that much compression.  This makes sense given that the expansion 
has to be accomplished in the FPGA hardware, and so can't be all that 
aggressive.

I generally find myself programming FPGAs from microcontrollers, and so 
I've got clock cycles and RAM to throw at the problem to try to do 
better.  We've had some limited luck implementing really stupid RLE, 
just compressing the runs of 0x00 and 0xFF and calling it a day.  But I 
was wondering whether anyone's found a better, more universal approach.

The ideal would be a compression/expansion library with fairly good 
compression ratios, where the expansion side could operate without 
malloc and on-the-fly on a stream of data rather than requiring large 
buffers.  The compression side could use phenominal amounts of compute 
power and RAM, since it would be happening on the PC.  But the goal 
would be able to decompress on something like an LPC1754 (32kB RAM total).

Anyone have any thoughts?

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

Reply by Vladimir Vassilevsky ●July 28, 20112011-07-28


Rob Gaddi wrote:

> Hey all--
> 
> So my experience with the native bitstream compression algorithms 
> provided by the FPGA vendors has been that they don't actually achieve 
> all that much compression.

> The ideal would be a compression/expansion library with fairly good 
> compression ratios, where the expansion side could operate without 
> malloc and on-the-fly on a stream of data rather than requiring large 
> buffers.

It won't be generally possible to beat vendor provided basic compression 
more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't 
really improve anything.

> The compression side could use phenominal amounts of compute 
> power and RAM, since it would be happening on the PC.  But the goal 
> would be able to decompress on something like an LPC1754 (32kB RAM total).
> 
> Anyone have any thoughts?

Just try 7zip on the FPGA binaries and see for yourself.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Reply by Rob Gaddi ●July 28, 20112011-07-28

On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
>
>
> Rob Gaddi wrote:
>
>> Hey all--
>>
>> So my experience with the native bitstream compression algorithms
>> provided by the FPGA vendors has been that they don't actually achieve
>> all that much compression.
>
>> The ideal would be a compression/expansion library with fairly good
>> compression ratios, where the expansion side could operate without
>> malloc and on-the-fly on a stream of data rather than requiring large
>> buffers.
>
> It won't be generally possible to beat vendor provided basic compression
> more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
> really improve anything.
>
>> The compression side could use phenominal amounts of compute power and
>> RAM, since it would be happening on the PC. But the goal would be able
>> to decompress on something like an LPC1754 (32kB RAM total).
>>
>> Anyone have any thoughts?
>
> Just try 7zip on the FPGA binaries and see for yourself.
>
>
> Vladimir Vassilevsky
> DSP and Mixed Signal Design Consultant
> http://www.abvolt.com

Just did, on an FPGA with an admittedly small fill ratio.  The 
uncompressed bitstream for an XC3S200 is 1,047,616 bits.  Using Xilinx 
bitstream compression gets it down to 814,336 bits, or about 100kB. 
7-Zip knocked it down to 16kB.

Another design uses a decent about of an XC6S45.  Native size is 
11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip 
gives me 395kB.

I've got a project coming up around an Altera Arria II, with 30Mb of 
configuration.  If I could get a 3:1, 4:1 compression ratio it would be 
a pretty radical savings on flash size.

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

Reply by Allan Herriman ●July 29, 20112011-07-29

On Thu, 28 Jul 2011 17:40:25 -0700, Rob Gaddi wrote:

> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
>>
>>
>> Rob Gaddi wrote:
>>
>>> Hey all--
>>>
>>> So my experience with the native bitstream compression algorithms
>>> provided by the FPGA vendors has been that they don't actually achieve
>>> all that much compression.
>>
>>> The ideal would be a compression/expansion library with fairly good
>>> compression ratios, where the expansion side could operate without
>>> malloc and on-the-fly on a stream of data rather than requiring large
>>> buffers.
>>
>> It won't be generally possible to beat vendor provided basic
>> compression more then by a factor of ~1.5 or so. The gain of 1.5 times
>> wouldn't really improve anything.
>>
>>> The compression side could use phenominal amounts of compute power and
>>> RAM, since it would be happening on the PC. But the goal would be able
>>> to decompress on something like an LPC1754 (32kB RAM total).
>>>
>>> Anyone have any thoughts?
>>
>> Just try 7zip on the FPGA binaries and see for yourself.
>>
>>
>> Vladimir Vassilevsky DSP and Mixed Signal Design Consultant
>> http://www.abvolt.com
> 
> Just did, on an FPGA with an admittedly small fill ratio.  The
> uncompressed bitstream for an XC3S200 is 1,047,616 bits.  Using Xilinx
> bitstream compression gets it down to 814,336 bits, or about 100kB.
> 7-Zip knocked it down to 16kB.
> 
> Another design uses a decent about of an XC6S45.  Native size is
> 11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip
> gives me 395kB.
> 
> I've got a project coming up around an Altera Arria II, with 30Mb of
> configuration.  If I could get a 3:1, 4:1 compression ratio it would be
> a pretty radical savings on flash size.


A design done for a client, using Virtex 6, with gzip -9 for 
compression.  Figures are from memory and are approximate.

Uncompressed: 7.5MBytes
compressed: a few 10s of kB (empty FPGA)
compressed: 500kBytes (mostly empty FPGA)
compressed: 2MBytes (mostly full FPGA)
compressed: 7.5MBytes (with bitstream encryption)

Note: there's no point using compression with encryption, as the 
encrypted images don't compress.

I used gzip as the decompresser is open source and fairly "light".

The Xilinx built-in compression doesn't do a good job because it merely 
joins identical frames.  The chance of getting identical frames is low 
for any design that uses a reasonable amount of the fabric.  (If you're 
not using most of the fabric, you could be using a smaller, cheaper 
device.)
OTOH, if you are using bitstream encryption, the built-in compression is 
the only compression you can use and it will be better than nothing.

Regards,
Allan

Reply by Thomas Entner ●July 29, 20112011-07-29

From my memory, Altera has a much better "native" compression than
Xilinx, with typical compression-ratios of about 50%.

Regards,

Thomas

www.entner-electronics.com

Reply by Jason ●July 29, 20112011-07-29

On Jul 28, 8:40=A0pm, Rob Gaddi <rga...@technologyhighland.com> wrote:
> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
>
>
>
>
>
>
>
>
>
>
>
> > Rob Gaddi wrote:
>
> >> Hey all--
>
> >> So my experience with the native bitstream compression algorithms
> >> provided by the FPGA vendors has been that they don't actually achieve
> >> all that much compression.
>
> >> The ideal would be a compression/expansion library with fairly good
> >> compression ratios, where the expansion side could operate without
> >> malloc and on-the-fly on a stream of data rather than requiring large
> >> buffers.
>
> > It won't be generally possible to beat vendor provided basic compressio=
n
> > more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
> > really improve anything.
>
> >> The compression side could use phenominal amounts of compute power and
> >> RAM, since it would be happening on the PC. But the goal would be able
> >> to decompress on something like an LPC1754 (32kB RAM total).
>
> >> Anyone have any thoughts?
>
> > Just try 7zip on the FPGA binaries and see for yourself.
>
> > Vladimir Vassilevsky
> > DSP and Mixed Signal Design Consultant
> >http://www.abvolt.com
>
> Just did, on an FPGA with an admittedly small fill ratio. =A0The
> uncompressed bitstream for an XC3S200 is 1,047,616 bits. =A0Using Xilinx
> bitstream compression gets it down to 814,336 bits, or about 100kB.
> 7-Zip knocked it down to 16kB.
>
> Another design uses a decent about of an XC6S45. =A0Native size is
> 11,875,104 bits (~1.5MB). =A0Bitstream compresson gives me 1.35MB. =A07-Z=
ip
> gives me 395kB.
>
> I've got a project coming up around an Altera Arria II, with 30Mb of
> configuration. =A0If I could get a 3:1, 4:1 compression ratio it would be
> a pretty radical savings on flash size.
>
> --
> Rob Gaddi, Highland Technology
> Email address is currently out of order

The algorithm that 7-Zip uses internally for .7z file compression is
LZMA:

http://en.wikipedia.org/wiki/Lzma

One characteristic of LZMA is that its decoder is much simpler than
the encoder, making it well-suited for your application. You would be
most interested in the open-source LZMA SDK:

http://www.7-zip.org/sdk.html

They provide an ANSI C implementation of a decoder that you can port
to your platform. Also, there is a reference application for an
encoder (I believe also written in C). I have used this in the past to
compress firmware for embedded systems with good success. I use the
encoder as a post-build step to compress the firmware image into an
LZMA stream (note that the compression is not done with 7-Zip, as the .
7z format is a full-blown archive; the reference encoder just gives
you a stream of compressed data, which is what you want). The
resulting file is then decompressed on the embedded target at firmware
update time. The decoder source code is most amenable to porting to 32-
bit architectures; I have implemented it on the LPC2138 ARM7 device
(with the same 32 KB of RAM as your part) as well as a few AVR32UC3
parts.

A couple other things: I originally did this ~4 years ago with a much
older version of the SDK; it's possible that things have changed since
then, but it should still be worth a look. LZMA provides good
compression ratios with a decoder that in my experience runs well on
embedded platforms. Secondly, you do have to be careful with the
parameters you use at the encoder if you want to bound the amount of
memory required at the decoder. More specifically, you need to be
careful which "dictionary size" you use for the encoder. As you might
expect, a larger dictionary gives you better compression ratios, but
the target running the decoder will require at least that much memory
(e.g. an 8 KB dictionary size will require at least 8 KB of memory for
the decoder).

Jason

Reply by Nico Coesel ●July 29, 20112011-07-29

Rob Gaddi <rgaddi@technologyhighland.com> wrote:

>Hey all--
>
>So my experience with the native bitstream compression algorithms 
>provided by the FPGA vendors has been that they don't actually achieve 
>all that much compression.  This makes sense given that the expansion 
>has to be accomplished in the FPGA hardware, and so can't be all that 
>aggressive.
>
>I generally find myself programming FPGAs from microcontrollers, and so 
>I've got clock cycles and RAM to throw at the problem to try to do 
>better.  We've had some limited luck implementing really stupid RLE, 
>just compressing the runs of 0x00 and 0xFF and calling it a day.  But I 
>was wondering whether anyone's found a better, more universal approach.

The problem is that Hufman based algorithms don't gain that much. I
recall someone has had lots of succes using an RLE like scheme on
nibbles (4 bit) instead of whole bytes.

Look here:
http://www.sulimma.de/prak/ws0001/projekte/ralph/Projekt/index.htm
-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Reply by Vladimir Vassilevsky ●July 29, 20112011-07-29

Rob Gaddi wrote:

> On 7/28/2011 5:15 PM, Vladimir Vassilevsky wrote:
>> Rob Gaddi wrote:
>>
>>> So my experience with the native bitstream compression algorithms
>>> provided by the FPGA vendors has been that they don't actually achieve
>>> all that much compression.
>>
>> It won't be generally possible to beat vendor provided basic compression
>> more then by a factor of ~1.5 or so. The gain of 1.5 times wouldn't
>> really improve anything.
>>
> Native size is 
> 11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip 
> gives me 395kB.

Interesting.

I tried to compress JBCs from heavily used Altera Cyclone, got only 
about x1.5 of compression.

As for compression algorithm, something like a bit oriented LZSS or LZW 
would be easy to implement. The decompression part is trivial. If you 
don't care about speed, the compression part is very simple, too. I 
don't know if the further sophistication of the algorithm would do much 
of a difference.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Reply by Noob ●July 29, 20112011-07-29

[ NB: I've added comp.compression to the mix ]

Jason wrote:

> Rob Gaddi wrote:
> 
>> Just did, on an FPGA with an admittedly small fill ratio.  The
>> uncompressed bitstream for an XC3S200 is 1,047,616 bits.  Using Xilinx
>> bitstream compression gets it down to 814,336 bits, or about 100kB.
>> 7-Zip knocked it down to 16kB.
>>
>> Another design uses a decent about of an XC6S45.  Native size is
>> 11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip
>> gives me 395kB.
>>
>> I've got a project coming up around an Altera Arria II, with 30Mb of
>> configuration.  If I could get a 3:1, 4:1 compression ratio it would be
>> a pretty radical savings on flash size.
> 
> The algorithm that 7-Zip uses internally for .7z file compression is
> LZMA:
> 
> http://en.wikipedia.org/wiki/Lzma
> 
> One characteristic of LZMA is that its decoder is much simpler than
> the encoder, making it well-suited for your application. You would be
> most interested in the open-source LZMA SDK:
> 
> http://www.7-zip.org/sdk.html
> 
> They provide an ANSI C implementation of a decoder that you can port
> to your platform. Also, there is a reference application for an
> encoder (I believe also written in C). I have used this in the past to
> compress firmware for embedded systems with good success. I use the
> encoder as a post-build step to compress the firmware image into an
> LZMA stream (note that the compression is not done with 7-Zip, as the .
> 7z format is a full-blown archive; the reference encoder just gives
> you a stream of compressed data, which is what you want). The
> resulting file is then decompressed on the embedded target at firmware
> update time. The decoder source code is most amenable to porting to 32-
> bit architectures; I have implemented it on the LPC2138 ARM7 device
> (with the same 32 KB of RAM as your part) as well as a few AVR32UC3
> parts.
> 
> A couple other things: I originally did this ~4 years ago with a much
> older version of the SDK; it's possible that things have changed since
> then, but it should still be worth a look. LZMA provides good
> compression ratios with a decoder that in my experience runs well on
> embedded platforms. Secondly, you do have to be careful with the
> parameters you use at the encoder if you want to bound the amount of
> memory required at the decoder. More specifically, you need to be
> careful which "dictionary size" you use for the encoder. As you might
> expect, a larger dictionary gives you better compression ratios, but
> the target running the decoder will require at least that much memory
> (e.g. an 8 KB dictionary size will require at least 8 KB of memory for
> the decoder).

Lempel-Ziv-Oberhumer (LZO) might also be worth investigating.

http://en.wikipedia.org/wiki/Lempel&#4294967295;Ziv&#4294967295;Oberhumer

The LZO library implements a number of algorithms with the
following characteristics:
* Compression is comparable in speed to deflate compression.
* Very fast decompression
* Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level).
* Requires no additional memory for decompression other than the source and destination buffers.
* Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.

Regards.

Reply by Noob ●July 29, 20112011-07-29

[ Dropping comp.arch.fpga ]
Noob wrote:
> [ NB: I've added comp.compression to the mix ]
> 
> Jason wrote:
> 
>> Rob Gaddi wrote:
>>
>>> Just did, on an FPGA with an admittedly small fill ratio.  The
>>> uncompressed bitstream for an XC3S200 is 1,047,616 bits.  Using Xilinx
>>> bitstream compression gets it down to 814,336 bits, or about 100kB.
>>> 7-Zip knocked it down to 16kB.
>>>
>>> Another design uses a decent about of an XC6S45.  Native size is
>>> 11,875,104 bits (~1.5MB).  Bitstream compresson gives me 1.35MB.  7-Zip
>>> gives me 395kB.
>>>
>>> I've got a project coming up around an Altera Arria II, with 30Mb of
>>> configuration.  If I could get a 3:1, 4:1 compression ratio it would be
>>> a pretty radical savings on flash size.
>>
>> The algorithm that 7-Zip uses internally for .7z file compression is
>> LZMA:
>>
>> http://en.wikipedia.org/wiki/Lzma
>>
>> One characteristic of LZMA is that its decoder is much simpler than
>> the encoder, making it well-suited for your application. You would be
>> most interested in the open-source LZMA SDK:
>>
>> http://www.7-zip.org/sdk.html
>>
>> They provide an ANSI C implementation of a decoder that you can port
>> to your platform. Also, there is a reference application for an
>> encoder (I believe also written in C). I have used this in the past to
>> compress firmware for embedded systems with good success. I use the
>> encoder as a post-build step to compress the firmware image into an
>> LZMA stream (note that the compression is not done with 7-Zip, as the .
>> 7z format is a full-blown archive; the reference encoder just gives
>> you a stream of compressed data, which is what you want). The
>> resulting file is then decompressed on the embedded target at firmware
>> update time. The decoder source code is most amenable to porting to 32-
>> bit architectures; I have implemented it on the LPC2138 ARM7 device
>> (with the same 32 KB of RAM as your part) as well as a few AVR32UC3
>> parts.
>>
>> A couple other things: I originally did this ~4 years ago with a much
>> older version of the SDK; it's possible that things have changed since
>> then, but it should still be worth a look. LZMA provides good
>> compression ratios with a decoder that in my experience runs well on
>> embedded platforms. Secondly, you do have to be careful with the
>> parameters you use at the encoder if you want to bound the amount of
>> memory required at the decoder. More specifically, you need to be
>> careful which "dictionary size" you use for the encoder. As you might
>> expect, a larger dictionary gives you better compression ratios, but
>> the target running the decoder will require at least that much memory
>> (e.g. an 8 KB dictionary size will require at least 8 KB of memory for
>> the decoder).
> 
> Lempel-Ziv-Oberhumer (LZO) might also be worth investigating.
> 
> http://en.wikipedia.org/wiki/Lempel&#4294967295;Ziv&#4294967295;Oberhumer
> 
> The LZO library implements a number of algorithms with the
> following characteristics:
> * Compression is comparable in speed to deflate compression.
> * Very fast decompression
> * Requires an additional buffer during compression (of size 8 kB or 64 kB, depending on compression level).
> * Requires no additional memory for decompression other than the source and destination buffers.
> * Allows the user to adjust the balance between compression quality and compression speed, without affecting the speed of decompression.

Some of the following benchmarks may be of interest.

http://www.maximumcompression.com/
http://www.maximumcompression.com/data/exe.php
http://www.maximumcompression.com/data/summary_mf4.php

Regards.

Previous12 Next

Bitstream compression

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group