Practical CRCs for Embedded Systems

Stephen FriederichsOctober 21, 20157 comments

CRCs are a very practical tool for embedded systems: you're likely to need to use one as part of a communications protocol or to verify the integrity of a program image before writing it to flash. But CRCs can be difficult to understand and tricky to implement. The first time I attempted to write CRC code from scratch I failed once. Then twice. Then three times. Eventually I gave up and used an existing library. I consider myself intelligent: I got A's in school (that counts for something right?). But despite all of my education I couldn't translate the discussions of polynomial division, hamming distance and bit error rates into practical, useful code.

The reality of the situation is that without devoting a fair bit of time to it you won't be able to develop CRC code from scratch. For professionals time is often of the essence and while gaining an in-depth understanding of the particulars of CRC implementations is nice it's probably not what you're being paid to do. Luckily, there are resources that you can use which will help you implement common CRC algorithms quickly and (more importantly) correctly

The goal of this article is to show you how to implement common CRC algorithms quickly and with a minimum of fuss. It will not discuss any of the following:

Explaining how CRCs work - While I think I understand how CRCs work I'm in no way qualified to explain it to anyone else. Quite frankly, the how of CRCs is a heavy topic that touches on a lot of areas that are difficult for me. The ur-reference for CRCs is 'A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS' by Ross N. Williams. It's a hand-formatted text file, so it's very trustworthy.

Writing CRC code from scratch - There's rarely a need to reinvent the wheel in a professional setting but if you find that the approach in this article is lacking in one aspect or another (speed, memory usage, etc.) you may need to roll your own implementation. If you want a guide to writing your own CRC code, try the Barr Group's article on implementing CRCs.

Choosing the 'best' CRC algorithm/polynomial - Entire books and academic papers have been written on choosing the 'best' or at least, most appropriate CRC algorithm for specific situations. This article couldn't hope to scratch the surface. If you're interested in finding the 'best' algorithm for a particular purposes, start by reading Philip Koopman's CRC blog and in particular, his article on the best CRC polynomials to use.

This article doesn't need to focus on any of those questions because it turns out there are several standard CRCs that tend to be used often. For example, Bluetooth uses one called CRC-16 (which, amazingly enough, produces a 16-bit CRC value). There's also CRC-32 which is even more popular: it's used in Ethernet, MPEG-2, PNG, Gzip and in many other places. These are well-understood and have been independently implemented hundreds if not thousands of times. In fact they are so well-understood that there exists a Python utility which can automatically generate C implementations for a variety of these well-known CRCs: PyCRC.

It is capable of generating this code for either named CRC models or for CRC models specified by their fundamental parameters:

  • Width - The size (in bits) of the resulting CRC
  • Poly - The hexadecimal representation of the CRC polynomial
  • Reflect In - True or False: whether to perform a reflection operation on the input of the CRC algorithm
  • XOR In - The initial data value used when calculating the CRC
  • Reflect Out- True or False: whether to perform a reflection operation on the output of the CRC algorithm before returning the final result
  • XOR Out- The value to XOR the result of the CRC algorithm with before returning it

You'll notice I was rather vague about what all of these actually mean. The reasoning behind that is twofold:

  1. The explanations of these parameters are out of the scope of this article: instead, you should just consider the values of these parameters as inputs to PyCRC. Understanding them is unnecessary for this article and possibly confusing. 
  2. You don't need to know anything about these parameters if you simply want to implement a well-known algorithm - PyCRC knows the right values for common algorithms already.

To use PyCRC you'll need an installation of Python - either Python 2.7 or Python 3.0. Installation of PyCRC is covered on their website so this article won't focus on that, but I will demonstrate how to use it to generate C code with PyCRC, show you how to integrate that code into your own program and give you some basic pointers to save your sanity when working with CRCs. Lastly, I'll compare the advantages and disadvantages of various CRC generation algorithms that can be implemented in embedded systems.

Generating Code with PyCRC

PyCRC will ultimately generate two files which implement your the CRC:

  • crc.h - A header file which defines the CRC library API, and
  • crc.c - A source file which contains the CRC library implementation

Note that although I specified file names, you don't have to use them if it doesn't fit your application.

PyCRC lives on the command line, so in order to generate these files you'll have to open the command prompt (if you live on Windows), navigate to the directory containing the PyCRC script, and type a command similar to this:

python pycrc.py  --model <CRC model> --algorithm <CRC algorithm> --generate c -o crc.c
python pycrc.py  --model <CRC model> --algorithm <CRC algorithm> --generate h -o crc.h

There's some blanks in there, so let's fill them in one-by-one.

CRC Model

The CRC model is the name of the CRC method that you're trying to implement: 'CRC-32-MPEG' for example. PyCRC knows several of these models by name and doesn't require you to input all of the fundamental parameters yourself. The full list of models that PyCRC knows can be found here. If there's a model that you want to implement that PyCRC doesn't know you will have to enter all of the fundamental parameters yourself. You can find a list of those for a wide variety of models here.

CRC Algorithm

The CRC algorithm is the practical set of steps undertaken to produce a CRC from input data. There isn't just one way to generate a CRC - PyCRC implements three CRC generation algorithms each with their own advantages and disadvantages:

  • Bit-by-Bit - The most straightforward and least optimized CRC algorithm. Easy to understand, but slow.
  • Table-Driven - Uses a large lookup table to generate CRCs. Significantly faster than Bit-by-Bit but at the cost of increased memory usage.
  • Bit-by-Bit-Fast - An optimization of the Bit-by-Bit algorithm. Same memory usage as Bit-by-Bit but is faster (though not nearly as fast as table-driven). More difficult to understand.

If you're looking to learn more about how to write CRC code, you should pick apart the code generated for the Bit-by-Bit algorithm. The best all-around algorithm to generate a CRC in the real world is the Table-Driven algorithm - unless you have a severe memory limitation (like some embedded systems do). If that's the case, stick with Bit-by-Bit-Fast to save memory.

Generated Source

Knowing the CRC model and algorithm, you can generate the crc.c and crc.h files which implement your chosen configuration. In those files PyCRC will generate a CRC implementation that provides the following:

  • A CRC type via a typedef (crc_t). The actual type of the CRC depends on the width of the CRC (and by extension, the algorithm used). It's worth noting that PyCRC will rely on stdint.h for standard integer types (uint32_t, uint16_t, etc.) so if your toolchain doesn't provide that, you'll have to modify the generated code to use the correct built-in type for your architecture.
  • Three functions which allow you to generate CRCs: crc_init, crc_finalize and crc_update. These three functions will handle all CRC generation: the init function will return a CRC will return a CRC with the proper initial value for the specified algorithm, the update function adds one or more bytes of data to the CRC and the finalize function returns the final CRC. 

Testing the Source

I always like to debug whatever I can on my desktop PC before I start debugging on an embedded system. To that end, I will usually either unit test all of the libraries I write (or generate) or create a little driver program that allows me to access the functionality of the libraries on the command line. To test this CRC code I decided to write a driver program which does three things:

  • Identifies its software revision 
  • Performs a self-test of the CRC library, and
  • Generates the CRC of specific data 

It's useful for any program to include a software revision and report it to ease the debugging process. Debugging a CRC generation program can be tedious and frustrating because you will usually tweak the software a lot before it generates the correct CRC. It's very useful to know for certain what revision of the code produces the correct result. For extra points, utilize source control to save every revision and note the result of the self-test in the commit message so you can restore old, working code if you screw up later.

The 'self-test' functionality of the driver program is crucial. Working with CRCs is a frustrating experience because you often have no idea if the output of the CRC code is correct or not. There's a way to mitigate that for standard CRC algorithms: the 'check' result. The 'check' result is the CRC of a standard data sequence generated with a particular CRC model. The standard data sequence in question is the string "123456789" encoded in ASCII. Thus, if you represent it as hex data in bytes it is: 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39. Everyone uses this string to test their CRC algorithms. You can find the list of 'check' results for the models PyCRC implements here. If you're using a custom model you'll have to use a CRC generation tool such as RevEng.

At some point you'll want to generate the CRC of some specific data on a microcontroller and you'll need a way to verify the microcontroller is generating the correct CRCs all the time and not just during self-test. The best way to ensure this is to generate the CRC twice and compare the results: once using the driver program and once on your embedded system. Many communication systems work in a similar way: they will generate a CRC for the data in a message, then append it to the end of the data transmission. That way, the other end can generate a CRC on the recieved data and compare the received vs. expected CRC. Usually a CRC mismach means a transmission error occurred. It can also be used as a debugging tool to help you ensure that the mundane details of your embedded system are in order: things like big vs. little-endian transmission or data storage. Thus, it's crucial that you add the ability to generate a CRC for arbitrary data to your driver program.

There is a Github repository which contains the source for the desktop driver program I created. It can be compiled in a MinGW environment as described here. It should produce the correct 'check' result for CRC-32-MPEG: 0x0376E6E7. It will also produce the CRC of a file containing random data: data/data.bin. More on this later.

CRCs for Embedded Systems

With the driver program complete and producing the correct results we have a basis of comparison for the CRC results produced on an embedded system. To show how the CRC code generated by PyCRC can be integrated into an embedded application, I created a program for the AVR ATMega328P which can be found here. The program does the same things as the driver program on the PC:

  • During startup it transmits a version identifier over the UART
  • It generates the 'check' CRC for CRC-32-MPEG and transmits it over the UART
  • It generates the CRC of the first 512 bytes of data sent to it over the UART and transmits the result back to the host PC

So, to test the embedded CRC code, all I have to do is verify the 'check' CRC produced by the embedded system is correct (and use the byte-orientation of the transmitted value as a basis for interpreting later results) and then transmit the data.bin file the driver program produced the CRC of to the ATMega328P and verify it produces the same CRC.  With this straightforward approach we can quickly verify the correctness of the embedded code.

That's not all the program does. It also implements four separate CRC algorithms on the ATMega328P to compare their performance:

  • Bit-by-Bit
  • Bit-by-Bit-Fast
  • Table-Driven (RAM)
  • Table-Driven (Program Memory)

The performance of each of these algorithms was determined by raising an I/O pin at the start of the CRC generation process and lowering it at the end. Execution of each CRC calculation was separated from the next by a 1ms delay to allow the timing to be read more easily on the oscilloscope. 

You're familiar with Bit-by-Bit and Bit-by-Bit-Fast, but I changed-up the Table-Driven algorithm a bit. Let me explain why. Earlier, I suggested that the Table-Driven approach was memory-intensive but didn't specify just how memory intensive it was. The Table-Driven approach requires a lookup table with as many entries as the number of possible values of an 8-bit variable (256). Each entry's size is equal to the length of the resulting CRC. In this case, using the CRC-32-MPEG model (which produces a 32 bit / 4 byte CRC), that means that the size of the table is 256 entries * 4 bytes/entry = 1024 bytes or 1KByte. 

On most embedded systems all malleable data is stored in RAM of some sort. RAM is fast to access, but usually expensive in one way or another (power usage, physical size on the chip, monetary cost, etc.), so it is usually limited and embedded system designers are hesitant to use it if they don't have to. On the ATMega328P there is a total of 2K of SRAM and a full half of it will be taken up by the CRC table when using the Table-Driven approach. However, because the CRC table is constant data and we won't need to modify it, we can avoid storing the data in RAM and store it in Program Memory instead. Other than being unable to change the data, the only other disadvantage is that program is generally slower to access than RAM. By making this change we minimize the disadvantage of the Table-Driven algorithm (high memory usage) at the cost of lower performance - but how much lower? The only way to know is to find out, so I implemented the Table-Driven algorithm twice with one table in RAM and the other in Program Memory. 

Going in I assumed the Bit-by-Bit algorithm would be the slowest probably followed by Bit-by-Bit-Fast. I assumed that Table-Driven (RAM) would be fastest, but I had no idea where Table-Driven (Program Memory) would fall. It was an exciting test!

Results

The following are (roughly) the times needed to calculate the CRC of 512 bytes of data (the contents of data/data.bin) using each of the algorithms:

  • Bit-by-Bit: ~31ms
  • Bit-by-Bit-Fast: ~20.8ms
  • Table-Driven (RAM): ~2.8ms
  • Table-Driven (Program Memory): ~2.8ms

To me, this is a surprising result. My Bit-by-Bit and Bit-by-Bit-Fast guesses were spot-on and technically so was my Table-Driven (RAM) guess. However, I expected some decrease in performance when the table was stored in Program Memory and I'm astounded to see none at all. The differences in performance are interesting: Bit-by-Bit-Fast is a respectable 33% faster than the base Bit-by-Bit, but the Table-Driven approaches are an order of magnitude (1/10th the time) improvement.

Caveat: These results are only meant to give a relative comparison of the performance of the different algorithms on a single embedded platform. There's no way these results will guarantee any specific behavior or timing on your embedded system - particularly with respect to the performance of the Table-Driven (Program Memory) algorithm. Although many microcontrollers are capable of accessing constant data in program memory the results seen here depend heavily on hardware, not software. Thus, while these results certainly look good for the Table-Driven (Program Memory) algorithm, you should do your own tests on your particular embedded system to verify it will be sufficient for your purposes.

Conclusions

  • It's much easier to use PyCRC to generate an implementation of a well-known CRC model than to roll your own implementation.
  • A desktop-based driver program will help to verify the operation of the generated CRC code.
  • If your microcontroller has a sufficient amount of RAM, go straight for a table-driven approach with the table stored in RAM.
  • If you have concerns about RAM usage, but your microcontroller can read data from Program Memory, attempt to store the table in Program Memory and compare the performance vs RAM before deciding. 
  • If none of the above applies to you, use the bit-by-bit-fast method.

Hopefully this article will help save you some time and frustration when implementing CRCs in embedded systems. The code I linked to is free for you to use in any way you please.



Previous post by Stephen Friederichs:
   Coding Step 3 - High-Level Requirements
Next post by Stephen Friederichs:
   Coding Step 4 - Design


Comments:

[ - ]
Comment by microsoftwareOctober 23, 2015
Hm , do you check STEP BY STEP the binary division that calculates the CRC of your sample data ? Is there a mathematical description (it is, may be, simple and quick) about the procedure of calculation of CRC?
Don’t be sure about online calculators that don’t present the binary division, you don’t know what is encrypted behind them. Check the binary division only. And yes, if the CRC of your sample data is correct then the CRC of random data shall be correct. CRC procedure is so sensitive,
Thanks.
[ - ]
Comment by jms_nhOctober 25, 2015
I agree, this issue was one I faced last year (see my article on CRCs: http://www.embeddedrelated.com/showarticle/669.php)
[ - ]
Comment by microsoftwareOctober 23, 2015
How are you sure that your PYCRC code running at your Pc produces correct CRC results ?
[ - ]
Comment by SFriederichsOctober 23, 2015
By verifying the result of the 'check' against the value found online. It can't 100% verify that the CRC of the random data is correct, but if the code works for the check value, it should work for the random data.
[ - ]
Comment by microsoftwareOctober 25, 2015
Oh, no , you are presenting times of calculation 31 ms , 2.8 ms etc. but wthat is the frequency of the clock of your board? Knowing the frequency of the clock and the time passed to finish the calculation of CRC you can calculate the number of cycles that the CPU spent to finish the calculation.
Thanks.
[ - ]
Comment by SFriederichsOctober 26, 2015
It's an 8MHz clock, but calculating the number of clock cycles isn't really going to give you very interesting information unless you're planning on always using an ATMega328P. The differences in work done vs clock cycle for different processors are going to invalidate any direct performance comparison you want to make. The timing is provided mainly for illustrative purposes to show the efficiency of the various algorithms.
[ - ]
Comment by microsoftwareOctober 26, 2015
Yes, the number of cycles spent by the CPU is a standard for comparison the efficincy of various algorithms and the efficiency of various microcontrollers.I think that 2.8 msec for 512 bytes is a big time. (2800?ses*8=22400 cycles for 512 bytes or about 43 cycles per byte. Big I think.

thanks

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up
or Sign in