Linear Feedback Shift Registers for the Uninitiated, Part XVII: Reverse-Engineering the CRC
Last time, we continued a discussion about error detection and correction by covering Reed-Solomon encoding. I was going to move on to another topic, but then there was this post on Reddit asking how to determine unknown CRC parameters:
I am seeking to reverse engineer an 8-bit CRC. I don’t know the generator code that’s used, but can lay my hands on any number of output sequences given an input sequence.
This is something I call the “unknown oracle” problem. You have a black-box system (the “oracle”) that implements CRC calculation. Maybe it’s a server on the Internet, maybe it’s a computer program for which you don’t have the source code. In any case, you can submit any number of different messages to the oracle, and it will send back a CRC value.
But you don’t like relying on the oracle. It’s pissing you off. Maybe it charges you a fee every time you use it. Or you have to design a system, and it’s a liability to use some third-party software or service you can’t control. Or your arch-nemesis has programmed the oracle to respond to your CRC requests only through a Fran Drescher “The Nanny” Talking Doll which has been hacked with the appropriate voice samples. (“Oh, all right, ya really wanna know? It’s C9 03. Ah ha ha ha ha ha ha ha!”) So you’d like to figure out how to calculate the CRC yourself, and make sure that the oracle’s services are no longer necessary.
Can we do this? Absolutely!
Known Unknowns: Ice Cream Cone Endianness Again
The first thing we need is to figure out what information is required to compute a CRC that matches the oracle’s output.
We know we need to figure out the generating polynomial. But that’s not sufficient. I talked about CRCs in Part XV as well as in an earlier article, where I quoted Ross Williams’s Painless Guide to CRC Error Detection Algorithms, in which he mentions you need
- Width of the poly (polynomial).
- Value of the poly.
- Initial value for the register.
- Whether the bits of each byte are reflected before being processed.
- Whether the algorithm feeds input bytes through the register or xors them with a byte from one end and then straight into the table.
- Whether the final register value should be reversed (as in reflected versions).
- Value to XOR with the final register value.
We’re going to take care of most of these. Here’s my list of important parameters:
- generating polynomial \( g(x) \)
- initial value \( p_i(x) \)
- final value \( p_f(x) \)
- whether bytes are interpreted as least-significant-bit first or not.
- whether the LFSR is shifted forwards or backwards (in finite field math, this amounts to multiplying by \( x \) or \( x^{-1} \), respectively)
The ones we won’t be using are
- whether or not the generating polynomial’s coefficients are reversed — this is an internal implementation issue; if I use the polynomial \( g(x) = x^8 + x^4 + x^3 + x^2 + 1 \) (
0x11D
in hex,100011101
in binary) in a bit-reversed manner, then it’s the same thing as if I used the polynomial \( g(x) = x^8 + x^6 + x^5 + x^4 + 1 \) (0x171
in hex,101110001
in binary) in a non-bit-reversed manner. It doesn’t matter whether you order a chocolate strawberry peach vanilla banana pistachio peppermint lemon orange butterscotch ice cream cone, or a butterscotch orange lemon peppermint pistachio banana vanilla peach strawberry chocolate ice cream cone, as long as you and the ice cream vendor have the same idea of which scoop goes on top.
- independent choices for whether the input is sent in LSB first, and whether the output is sent in LSB first. This is possible but seems silly. Work with consistent endian-ness.
As I mentioned in part XV, there are two equivalent methods of handling CRCs. One is to feed the input into an LFSR, and the other is a mathematical computation using finite fields.
The LFSR approach is as follows:
- Create an LFSR with taps corresponding to the generating polynomial’s coefficients
- Initialize the LFSR with the coefficients of \( p_i(x) \), the initial value
- For each byte of the message \( M \), clock in the bits into the polynomial — LSB first or MSB first, depending on how you define the CRC. So the message
Hi!
is either01001000 01101001 00100001
(MSB first) or00010010 10010110 10000100
(LSB first). - Clock in \( N \) more zeros (where \( N \) is the number of bits in the LFSR and the degree of the generating polynomial)
- Take the LFSR state and XOR it with the coefficients of \( p_f(x) \), the final value, then convert to bytes, in the appropriate bit and byte order.
The finite field approach is to compute \( c(x) = f_m(M(x)) = \left(x^{N+8m}p_i(x) + x^N M(x) + p_f(x)\right) \bmod g(x) \) where \( m \) is the number of bytes in the message \( M(x) \). (Oh, and remember that we’re working in \( GF(2)[x] \), so all the polynomial coefficient math is modulo 2, and both addition and subtraction translate into an XOR.) The bit and byte order polarities just determine how bytes translate into polynomial coefficients.
These approaches are equivalent, and if you don’t understand finite fields you can always stick to plain old LFSRs. But the finite field description of a CRC is going to help us reverse-engineer the CRC parameters.
!Kcab Nrut !Kcab Nrut
The forward vs. backwards option of CRCs is somewhat annoying, and we’re forced to deal with it, because some people decided they were going to right-shift LFSRs rather than left shift. As I mentioned, in finite-field algebra this corresponds to multiplying by \( x^{-1} \) rather than \( x \); in the LFSR, what we do is:
- if the least significant bit of the LFSR is 1, we XOR the LFSR contents with the polynomial and right-shift, bringing a 1 bit into the most significant bit
- otherwise we right-shift, bringing a 0 bit into the most significant bit
This approach is used with the Ethernet/PNG/PKZIP CRC; if you look at the PNG specification’s sample code, you’ll see this snippet for preparing a table:
/* Make the table for a fast CRC. */
void make_crc_table(void)
{
unsigned long c;
int n, k;
for (n = 0; n < 256; n++) {
c = (unsigned long) n;
for (k = 0; k < 8; k++) {
if (c & 1)
c = 0xedb88320L ^ (c >> 1);
else
c = c >> 1;
}
crc_table[n] = c;
}
crc_table_computed = 1;
}
Here the polynomial is 0x1db710641
(which is 0xedb88320
left-shifted by 1 with a 1 in the LSB) and the table represents each of the possible 8-bit values multiplied by \( x^{-8} \) in the finite field.
It turns out that there’s an equivalent method of doing this, and that is to use the complementary polynomial (with coefficients reversed) and bit-reverse the entire LFSR contents:
from libgf2.gf2 import GF2QuotientRing qrfwd = GF2QuotientRing(0x1db710641) qrrev = GF2QuotientRing(0x104c11db7) def bit_reverse(x,n): y = 0 for _ in xrange(n): y <<= 1 y |= (x&1) x >>= 1 return y print('GF2QuotientRing(0x%x), 2^31 shifted right by 32: %08x' % (qrfwd.coeffs, qrfwd.rshiftraw(0x80000000,32))) bits_shifted_left = qrrev.lshiftraw(0x1,32) print(('GF2QuotientRing(0x%x), 1 shifted left by 32: %08x\n' +' ...and bit-reversed: %08x') % (qrrev.coeffs, bits_shifted_left, bit_reverse(bits_shifted_left, 32)))
GF2QuotientRing(0x1db710641), 2^31 shifted right by 32: edb88320 GF2QuotientRing(0x104c11db7), 1 shifted left by 32: 04c11db7 ...and bit-reversed: edb88320
So we can dispense with the need to deal with this \( x^{-1} \) business, and instead just calculate the CRC with a reversed polynomial and then reverse the output bits. This does have consequences, though, and we’ll deal with them later.
Affine vs. Linear
One important thing to note is that the general definition of a CRC is an affine function rather than a linear one — a truly linear function would produce zero output for zero input. But instead we have these pesky initial and final values. The function \( f(x) = mx \) is linear; if \( x = a_1x_1 + a_2x_2 \) is a linear superposition of two inputs, then \( f(x) \) is the same linear superposition of the outputs: \( f(x) = a_1f(x_1) + a_2f(x_2) \). Affine functions have an offset, like \( f(x) = mx+b \), and they mess up this linear superposition property a little bit. In the case of the CRC, we add an offset \( x^{N+8m}p_i(x) + p_f(x) \), making it an affine function for any fixed message length \( m \) in bytes. This is interesting because if we have any two messages \( M_1(x) \) and \( M_2(x) \) of length \( m \), and get CRCs \( c_1(x) = f_m(M_1(x)) \) and \( c_2(x) = f_m(M_2(x)) \), then the XOR of the CRCs is related to the XOR of the messages in a much simpler manner, and the initial and final values disappear:
$$c_1(x) + c_2(x) = x^N\left(M_1(x) + M_2(x)\right) \bmod g(x)$$
This also means that if \( z_m(x) = f_m(0) \) is the CRC result of feeding in an all-zero message of length \( m \), then \( \bar{c}(x) = c(x) + z_m(x) = x^N M(x) \bmod g(x) \) — we can get the initial and final values to go away if we compare with an all-zero message of the same length. Let’s see this in action:
from libgf2.util import parity def crc_bitwise(message, polynomial, init=0, final=0, lsbit_first=True, lsbyte_first=True, verbose=False): field = GF2QuotientRing(polynomial) n = field.degree polytail = polynomial ^ (1<<n) # "tail" = all bits of the polynomial except the leading one crc = field.lshiftraw(init,n) # This method uses the tail to avoid those extra n shifts # (we then need to pre-shift the initial value by n) for byte in message: bytecode = ord(byte) if verbose: print "%s (0x%02x)" % (repr(byte), bytecode) for bitnum in xrange(8): i = bitnum if lsbit_first else (7-bitnum) bit = (bytecode >> i) & 1 crc_prev = crc crc_prev_double = field.mulraw(2,crc) crc = crc_prev_double ^ (polytail if bit else 0) if verbose: print "%d %04x -> %04x -> %04x" % (bit, crc_prev, crc_prev_double, crc) if final: if verbose: print "%04x ^ %04x = %04x" % (crc, final, crc^final) crc ^= final # bit reverse within bytes if lsbit_first: out = 0 for k in xrange(n): out <<= 1 out ^= (crc >> ((n-1-k)^7)) & 1 if verbose: print "bit-reversed(%04x) = %04x" % (crc, out) crc = out # byte reverse if lsbyte_first: out = 0 for k in xrange((n+7)//8): out <<= 8 out ^= (crc >> (k*8)) & 0xff if verbose: print "byte-reversed(%04x) = %04x" % (crc, out) crc = out return crc def hexlify(s, separator=' '): return separator.join('%02x' % ord(c) for c in s) def bit_reverse(x,n): y = 0 for _ in xrange(n): y <<= 1 y |= (x&1) x >>= 1 return y def bit_reverse_in_bytes(x,n): """ bit-reverse each byte in x""" y = 0 for k in xrange(n): y |= bit_reverse(x>>(8*k), 8) << (8*k) return y def byte_reverse(x,n): y = 0 for _ in xrange(n): y <<= 8 y |= (x&0xff) x >>= 8 return y def optional_reverser(x,nbits, reverse_bits_in_bytes, reverse_bytes): if reverse_bytes: if reverse_bits_in_bytes: return bit_reverse(x,nbits) else: return byte_reverse(x,(nbits+7)//8) else: # bytes are in the same order if reverse_bits_in_bytes: return bit_reverse_in_bytes(x,(nbits+7)//8) else: return x def bkm(bits, sink=None): """ Berlekamp-Massey algorithm sink: used to store intermediate results for debugging or analysis """ m = -1 c = 1 b = 1 L = 0 fullbits = 0 nbits = len(bits) reversed_bits = 0 for n in xrange(nbits): reversed_bits = (reversed_bits << 1) | (1 if bits[n] else 0) d = parity(reversed_bits&c) if sink is not None: sink.append(dict(n=n, m=m, L=L, b=b, c=c, d=d, reversed_bits=reversed_bits)) if d == 1: # Discrepancy! cprev = c c ^= b << (n-m) if 2*L <= n: L = n+1 - L b = cprev m = n if sink is not None: sink.append(dict(n=None, m=m, L=L, b=b, c=c)) return c,L
class MysteryCrc(object): def __init__(self, poly, init, final, lsbit_first, lsbyte_first, name=None): self.poly = poly self.init = init self.final = final self.lsbit_first = lsbit_first self.lsbyte_first = lsbyte_first self.name = name def __call__(self, message): return crc_bitwise(message, self.poly, init=self.init, final=self.final, lsbit_first=self.lsbit_first, lsbyte_first=self.lsbyte_first) def __repr__(self): return ('MysteryCrc(poly=0x%x,init=0x%x,final=0x%x,lsbit_first=%s,lsbyte_first=%s%s)' % (self.poly, self.init, self.final, self.lsbit_first, self.lsbyte_first, '' if self.name is None else ',name=%s' % repr(self.name))) mystery_crc_1 = MysteryCrc(0x11d, init=0x56, final=0x78, lsbit_first=False, lsbyte_first=False) mystery_crc_2 = MysteryCrc(0x11d, init=0x56, final=0x78, lsbit_first=True, lsbyte_first=False) mystery_crc_3 = MysteryCrc(0x169, init=0x89, final=0xff, lsbit_first=True, lsbyte_first=False) mystery_crc_4 = MysteryCrc(0x1100b, init=0xffff, final=0xffff, lsbit_first=False, lsbyte_first=False) mystery_crc_5 = MysteryCrc(0x1100b, init=0xffff, final=0xffff, lsbit_first=True, lsbyte_first=False) mystery_crc_6 = MysteryCrc(0x103dd, init=0xffff, final=0xffff, lsbit_first=False, lsbyte_first=False) mystery_crc_7 = MysteryCrc(0x103dd, init=0xffff, final=0xffff, lsbit_first=True, lsbyte_first=False) mystery_crc_8 = MysteryCrc(0x103dd, init=0xffff, final=0xffff, lsbit_first=False, lsbyte_first=True) mystery_crc_9 = MysteryCrc(0x103dd, init=0xffff, final=0xffff, lsbit_first=True, lsbyte_first=True)
messages = ['\x00\x00', '\x00\x01', '\x00\x02', '\x00\x03', '\x55\xaa', '\xaa\x01', '\xff\xab'] crcs = [mystery_crc_1(msg) for msg in messages] c0 = crcs[0] print ("M(x) c(x) = z2(x) + cbar(x)") for message,c in zip(messages,crcs): print("%s -> 0x%04x = 0x%04x ^ 0x%04x" % (hexlify(message), c, c0, c ^ c0)) print ("") for i in [1,4]: print("0x%04x ^ 0x%04x = 0x%04x" % (crcs[i]^c0, crcs[i+1]^c0, crcs[i]^crcs[i+1]))
M(x) c(x) = z2(x) + cbar(x) 00 00 -> 0x0005 = 0x0005 ^ 0x0000 00 01 -> 0x0018 = 0x0005 ^ 0x001d 00 02 -> 0x003f = 0x0005 ^ 0x003a 00 03 -> 0x0022 = 0x0005 ^ 0x0027 55 aa -> 0x0049 = 0x0005 ^ 0x004c aa 01 -> 0x0066 = 0x0005 ^ 0x0063 ff ab -> 0x002a = 0x0005 ^ 0x002f 0x001d ^ 0x003a = 0x0027 0x004c ^ 0x0063 = 0x002f
This simplified CRC \( \bar{c}(x) = c(x) + z_m(x) = x^N M(x) \bmod g(x) \) is shown in the code output above as the cbar(x)
term on the right. It is linear: we can add messages 00 01 + 00 02 = 00 03
that yield simplified CRCs that add: 001d + 003a = 0027
, and add messages 55 aa + aa 01 = ff ab
that yield simplified CRCs 004c + 0063 = 002f
.
Determining the Polynomial, Part I: Basis Functions
This will give us one way to determine the CRC’s generator polynomial. We can probe the CRC by first feeding in the all-zero message, and then using it to compute \( \bar{c}(x) \) for messages \( M(x) \) which have a single one bit. This ends up computing \( x^{N+a} \bmod g(x) \) for various values of \( a \), which is the position of the one bit in the message. For example, with a four-byte message (\( m=4 \)) and \( a=2 \), then the test message is (in hex) 00 00 00 04
; with \( a=3 \) the test message is 00 00 00 08
. And so on.
import struct def calc_shift_count(k, lsbit_first=True): return (k&0xfff8) + 7 - (k&7) if lsbit_first else k def probe0(crcfunc, verbose=False): zerosamples = ['', '\x00', '\x00\x00', '\x00\x00\x00', '\x00\x00\x00\x00'] zero_crcs = [crcfunc(msg) for msg in zerosamples] if verbose: for msg,c in zip(zerosamples,zero_crcs): print("[%s] -> %04x" % (hexlify(msg), c)) return zero_crcs def probe1(crcfunc, lsbit_first=True, verbose=False, nprobe=32): fmt, nbytes = ('>I',4) if nprobe <= 32 else ('>Q',8) onesamples = [struct.pack(fmt,1<<calc_shift_count(k, lsbit_first)) for k in xrange(nprobe)] c0 = crcfunc('\x00'*nbytes) one_crcs = [crcfunc(msg) for msg in onesamples] for one, one_crc in zip(onesamples, one_crcs): if verbose: fmt = "[%s] -> %04x = crc(%s) + %04x lsb=%d" fmtargs = (hexlify(one), one_crc, ' '.join('00' for _ in xrange(nbytes)), one_crc ^ c0, (one_crc ^ c0)&1) if lsbit_first: fmt += ' (rev=%04x)' fmtargs += (bit_reverse_in_bytes(fmtargs[-2], 2),) print(fmt % fmtargs) return c0, one_crcs zero_crcs = probe0(mystery_crc_1, verbose=True) c0, one_crcs = probe1(mystery_crc_1, lsbit_first=False, verbose=True)
[] -> 00e8 [00] -> 0093 [00 00] -> 0005 [00 00 00] -> 00a0 [00 00 00 00] -> 0068 [00 00 00 01] -> 0075 = crc(00 00 00 00) + 001d lsb=1 [00 00 00 02] -> 0052 = crc(00 00 00 00) + 003a lsb=0 [00 00 00 04] -> 001c = crc(00 00 00 00) + 0074 lsb=0 [00 00 00 08] -> 0080 = crc(00 00 00 00) + 00e8 lsb=0 [00 00 00 10] -> 00a5 = crc(00 00 00 00) + 00cd lsb=1 [00 00 00 20] -> 00ef = crc(00 00 00 00) + 0087 lsb=1 [00 00 00 40] -> 007b = crc(00 00 00 00) + 0013 lsb=1 [00 00 00 80] -> 004e = crc(00 00 00 00) + 0026 lsb=0 [00 00 01 00] -> 0024 = crc(00 00 00 00) + 004c lsb=0 [00 00 02 00] -> 00f0 = crc(00 00 00 00) + 0098 lsb=0 [00 00 04 00] -> 0045 = crc(00 00 00 00) + 002d lsb=1 [00 00 08 00] -> 0032 = crc(00 00 00 00) + 005a lsb=0 [00 00 10 00] -> 00dc = crc(00 00 00 00) + 00b4 lsb=0 [00 00 20 00] -> 001d = crc(00 00 00 00) + 0075 lsb=1 [00 00 40 00] -> 0082 = crc(00 00 00 00) + 00ea lsb=0 [00 00 80 00] -> 00a1 = crc(00 00 00 00) + 00c9 lsb=1 [00 01 00 00] -> 00e7 = crc(00 00 00 00) + 008f lsb=1 [00 02 00 00] -> 006b = crc(00 00 00 00) + 0003 lsb=1 [00 04 00 00] -> 006e = crc(00 00 00 00) + 0006 lsb=0 [00 08 00 00] -> 0064 = crc(00 00 00 00) + 000c lsb=0 [00 10 00 00] -> 0070 = crc(00 00 00 00) + 0018 lsb=0 [00 20 00 00] -> 0058 = crc(00 00 00 00) + 0030 lsb=0 [00 40 00 00] -> 0008 = crc(00 00 00 00) + 0060 lsb=0 [00 80 00 00] -> 00a8 = crc(00 00 00 00) + 00c0 lsb=0 [01 00 00 00] -> 00f5 = crc(00 00 00 00) + 009d lsb=1 [02 00 00 00] -> 004f = crc(00 00 00 00) + 0027 lsb=1 [04 00 00 00] -> 0026 = crc(00 00 00 00) + 004e lsb=0 [08 00 00 00] -> 00f4 = crc(00 00 00 00) + 009c lsb=0 [10 00 00 00] -> 004d = crc(00 00 00 00) + 0025 lsb=1 [20 00 00 00] -> 0022 = crc(00 00 00 00) + 004a lsb=0 [40 00 00 00] -> 00fc = crc(00 00 00 00) + 0094 lsb=0 [80 00 00 00] -> 005d = crc(00 00 00 00) + 0035 lsb=1
qr = GF2QuotientRing(0x11d) for k in xrange(16): print '%04x' % qr.lshiftraw(1,k+8)
001d 003a 0074 00e8 00cd 0087 0013 0026 004c 0098 002d 005a 00b4 0075 00ea 00c9
Now, the interesting thing here is that we can take one particular bit of the simplified CRC — say, the least significant bit — and we get an output bit sequence from an LFSR. This means we can use the Berlekamp-Massey algorithm to figure out the polynomial:
lsbs = [(c0^c)&1 for c in one_crcs] poly1,n = bkm(list(reversed(lsbs))) print '%x' % poly1
11d
Easy-peasy! The only problem is that we don’t know whether the CRC is LSB-first or not. If it is, then, for example, the probe message for \( a=0 \) is 00 00 00 80
rather than 00 00 00 01
, and the probe message for \( a=1 \) is 00 00 00 40
rather than 00 00 00 02
. If we don’t know whether the CRC is LSB-first or not, we have two different sequences to plug into the Berlekamp-Massey algorithm. One of them will compute the polynomial, and the other will compute a different polynomial which is of higher degree.
_ = probe0(mystery_crc_2, verbose=True) _ = probe1(mystery_crc_2, lsbit_first=True, verbose=True)
[] -> 0017 [00] -> 00c9 [00 00] -> 00a0 [00 00 00] -> 0005 [00 00 00 00] -> 0016 [00 00 00 80] -> 00ae = crc(00 00 00 00) + 00b8 lsb=0 (rev=001d) [00 00 00 40] -> 004a = crc(00 00 00 00) + 005c lsb=0 (rev=003a) [00 00 00 20] -> 0038 = crc(00 00 00 00) + 002e lsb=0 (rev=0074) [00 00 00 10] -> 0001 = crc(00 00 00 00) + 0017 lsb=1 (rev=00e8) [00 00 00 08] -> 00a5 = crc(00 00 00 00) + 00b3 lsb=1 (rev=00cd) [00 00 00 04] -> 00f7 = crc(00 00 00 00) + 00e1 lsb=1 (rev=0087) [00 00 00 02] -> 00de = crc(00 00 00 00) + 00c8 lsb=0 (rev=0013) [00 00 00 01] -> 0072 = crc(00 00 00 00) + 0064 lsb=0 (rev=0026) [00 00 80 00] -> 0024 = crc(00 00 00 00) + 0032 lsb=0 (rev=004c) [00 00 40 00] -> 000f = crc(00 00 00 00) + 0019 lsb=1 (rev=0098) [00 00 20 00] -> 00a2 = crc(00 00 00 00) + 00b4 lsb=0 (rev=002d) [00 00 10 00] -> 004c = crc(00 00 00 00) + 005a lsb=0 (rev=005a) [00 00 08 00] -> 003b = crc(00 00 00 00) + 002d lsb=1 (rev=00b4) [00 00 04 00] -> 00b8 = crc(00 00 00 00) + 00ae lsb=0 (rev=0075) [00 00 02 00] -> 0041 = crc(00 00 00 00) + 0057 lsb=1 (rev=00ea) [00 00 01 00] -> 0085 = crc(00 00 00 00) + 0093 lsb=1 (rev=00c9) [00 80 00 00] -> 00e7 = crc(00 00 00 00) + 00f1 lsb=1 (rev=008f) [00 40 00 00] -> 00d6 = crc(00 00 00 00) + 00c0 lsb=0 (rev=0003) [00 20 00 00] -> 0076 = crc(00 00 00 00) + 0060 lsb=0 (rev=0006) [00 10 00 00] -> 0026 = crc(00 00 00 00) + 0030 lsb=0 (rev=000c) [00 08 00 00] -> 000e = crc(00 00 00 00) + 0018 lsb=0 (rev=0018) [00 04 00 00] -> 001a = crc(00 00 00 00) + 000c lsb=0 (rev=0030) [00 02 00 00] -> 0010 = crc(00 00 00 00) + 0006 lsb=0 (rev=0060) [00 01 00 00] -> 0015 = crc(00 00 00 00) + 0003 lsb=1 (rev=00c0) [80 00 00 00] -> 00af = crc(00 00 00 00) + 00b9 lsb=1 (rev=009d) [40 00 00 00] -> 00f2 = crc(00 00 00 00) + 00e4 lsb=0 (rev=0027) [20 00 00 00] -> 0064 = crc(00 00 00 00) + 0072 lsb=0 (rev=004e) [10 00 00 00] -> 002f = crc(00 00 00 00) + 0039 lsb=1 (rev=009c) [08 00 00 00] -> 00b2 = crc(00 00 00 00) + 00a4 lsb=0 (rev=0025) [04 00 00 00] -> 0044 = crc(00 00 00 00) + 0052 lsb=0 (rev=004a) [02 00 00 00] -> 003f = crc(00 00 00 00) + 0029 lsb=1 (rev=0094) [01 00 00 00] -> 00ba = crc(00 00 00 00) + 00ac lsb=0 (rev=0035)
for lsbit_first in [False, True]: c0, one_crcs = probe1(mystery_crc_2, lsbit_first=lsbit_first) poly2,n = bkm([(c0^c)&1 for c in reversed(one_crcs)]) print ('lsbit_first=%5s: computed generator polynomial g(x) = %x' % (lsbit_first, poly2))
lsbit_first=False: computed generator polynomial g(x) = c33 lsbit_first= True: computed generator polynomial g(x) = 11d
Berlekamp-Massey requires that for a polynomial of degree \( N \), you need at least \( 2N \) bits of a sequence to ensure that the result is correct. For an 8-bit CRC that means we need to try 17 different messages (all zeros, and 16 one-bit messages); for a 16-bit CRC we need to try 33 different messages.
We’ll show how to reduce the number of messages later, but for now let’s move on, because we have a way to determine the generator polynomial and whether the CRC takes in bits LSB-first or not. Next is the initial value \( p_i(x) \).
Determining the Initial and Final Values
Since we have the generator polynomial, we can use some finite field arithmetic to help figure out the initial value. Let’s feed in a couple of all-zero messages of various lengths, and compute the XOR between each message and the “empty” message of length zero:
zero_crcs = probe0(mystery_crc_1) for m, z in enumerate(zero_crcs): if m == 0: print('m=0: [] -> %04x' % z) else: print('m=%d: %13s -> %04x = %04x ^ %04x' % (m, '['+' '.join(['00']*m)+']', z, zero_crcs[0], z^zero_crcs[0]))
m=0: [] -> 00e8 m=1: [00] -> 0093 = 00e8 ^ 007b m=2: [00 00] -> 0005 = 00e8 ^ 00ed m=3: [00 00 00] -> 00a0 = 00e8 ^ 0048 m=4: [00 00 00 00] -> 0068 = 00e8 ^ 0080
If we go back to the CRC equation \( c(x) = \left(x^{N+8m}p_i(x) + x^N M(x) + p_f(x)\right) \bmod g(x) \), it turns out that \( z_m(x) \), which is the CRC of the all-zero message \( M(x)=0 \) of length \( m \), has the value \( z_m(x) = \left(x^{N+8m}p_i(x) + p_f(x)\right) \bmod g(x) \).
This means that \( z_m(x) + z_0(x) = x^N\left(x^{8m} + 1\right)p_i(x) \bmod g(x) \), and we should be able to determine \( p_i(x) \) from the empty message, which has CRC \( z_0(x), \) and the single-byte message 00
, which has CRC \( z_1(x) \):
$$p_i(x) = \frac{z_1(x) + z_0(x)}{x^N(x^8 + 1)} \bmod g(x)$$
qr = GF2QuotientRing(0x11d) # let's predict zm(x) + z0(x) init = 0x56 init_shifted = qr.lshiftraw(init,8) for m in xrange(5): print '%x' % (qr.lshiftraw(init_shifted,m*8) ^ init_shifted)
0 7b ed 48 80
init = qr.divraw(zero_crcs[1]^zero_crcs[0], 0x10100) print 'initial value = 0x%x' % init
initial value = 0x56
There! Now for the final value we can just solve \( z_0(x) = x^N p_i(x) + p_f(x) \) to get \( p_f(x) = x^N p_i(x) + z_0(x) \) :
final = qr.lshiftraw(init,8) ^ zero_crcs[0] print 'final value = 0x%x' % final
final value = 0x78
Now let’s tie this all together and automate it:
def reverse_engineer_crc_bkm(crcfunc, nprobe=32): """ Reverse-engineer CRC using Berlekamp-Massey """ zero_crcs = probe0(crcfunc) c0_t, one_crcs_t = probe1(crcfunc, lsbit_first=True, nprobe=nprobe) poly_t,n_t = bkm([(c0_t^c)&1 for c in reversed(one_crcs_t)]) c0_f, one_crcs_f = probe1(crcfunc, lsbit_first=False, nprobe=nprobe) poly_f,n_f = bkm([(c0_f^c)&1 for c in reversed(one_crcs_f)]) # We have two choices. See which one is consistent. qr = GF2QuotientRing(poly_t) polytail_t1 = one_crcs_t[0] ^ c0_t polytail_t0 = poly_t ^ (1<<n_t) if bit_reverse_in_bytes(polytail_t1, n_t//8) == polytail_t0: lsbit_first = True lsbyte_first = False poly,n = poly_t,n_t elif bit_reverse(polytail_t1, n_t) == polytail_t0: lsbit_first = True lsbyte_first = True poly,n = poly_t,n_t else: polytail_f1 = one_crcs_f[0] ^ c0_f polytail_f0 = poly_f ^ (1<<n_f) qr = GF2QuotientRing(poly_f) poly,n = poly_f,n_f lsbit_first = False if polytail_f1 == polytail_f0: lsbyte_first = False elif byte_reverse(polytail_f1, n_f//8) == polytail_f0: lsbyte_first = True else: raise ValueError('Could not find a matching polynomial') # Undo the bit/byte reversing, so we can compute the polynomials zero_crcs = [optional_reverser(z,n,reverse_bits_in_bytes=lsbit_first, reverse_bytes=lsbyte_first) for z in zero_crcs] qr = GF2QuotientRing(poly) # Now compute the initial and final values u = 0x101 init = qr.divraw(zero_crcs[1]^zero_crcs[0],qr.lshiftraw(u,n)) final = qr.lshiftraw(init,n) ^ zero_crcs[0] return poly, lsbit_first, lsbyte_first, init, final for crcfunc in [mystery_crc_1, mystery_crc_2, mystery_crc_3, mystery_crc_4, mystery_crc_5, mystery_crc_6, mystery_crc_7, mystery_crc_8, mystery_crc_9]: crcparams = reverse_engineer_crc_bkm(crcfunc) print("poly=0x%05x, lsbit_first=%5s, lsbyte_first=%5s, init=0x%04x, final=0x%04x" % crcparams) print(" detected from %s" % crcfunc) if crcparams == (crcfunc.poly, crcfunc.lsbit_first, crcfunc.lsbyte_first, crcfunc.init, crcfunc.final): print(" Match!") else: print(" Mismatch... hmmm")
poly=0x0011d, lsbit_first=False, lsbyte_first=False, init=0x0056, final=0x0078 detected from MysteryCrc(poly=0x11d,init=0x56,final=0x78,lsbit_first=False,lsbyte_first=False) Match! poly=0x0011d, lsbit_first= True, lsbyte_first=False, init=0x0056, final=0x0078 detected from MysteryCrc(poly=0x11d,init=0x56,final=0x78,lsbit_first=True,lsbyte_first=False) Match! poly=0x00169, lsbit_first= True, lsbyte_first=False, init=0x0089, final=0x00ff detected from MysteryCrc(poly=0x169,init=0x89,final=0xff,lsbit_first=True,lsbyte_first=False) Match! poly=0x1100b, lsbit_first=False, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x1100b,init=0xffff,final=0xffff,lsbit_first=False,lsbyte_first=False) Match! poly=0x1100b, lsbit_first= True, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x1100b,init=0xffff,final=0xffff,lsbit_first=True,lsbyte_first=False) Match! poly=0x103dd, lsbit_first=False, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=False,lsbyte_first=False) Match! poly=0x103dd, lsbit_first= True, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=True,lsbyte_first=False) Match! poly=0x103dd, lsbit_first=False, lsbyte_first= True, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=False,lsbyte_first=True) Match! poly=0x103dd, lsbit_first= True, lsbyte_first= True, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=True,lsbyte_first=True) Match!
Whee! We did it!
But We Don’t Really Need Berlekamp-Massey After All!
Okay, there’s a shorter approach to getting the generator polynomial. I mentioned in passing in Part XVI that there’s an easy way to get the generator polynomial for Reed-Solomon, which is to pass in a message \( M(x) = 1 \), and then what you get for a remainder is the generator polynomial without the leading monomial term. We can do the same thing with CRCs, computing \( \bar{c}(x) = c(M_0) + c(M_0 + 1) \):
_ = probe0(mystery_crc_1, verbose=True) print ("M(x) c(x) = z4(x) + cbar(x)") _ = probe1(mystery_crc_1, lsbit_first=False, verbose=True, nprobe=1)
[] -> 00e8 [00] -> 0093 [00 00] -> 0005 [00 00 00] -> 00a0 [00 00 00 00] -> 0068 M(x) c(x) = z4(x) + cbar(x) [00 00 00 01] -> 0075 = crc(00 00 00 00) + 001d lsb=1
We get 1d
, which is the “tail” of the polynomial 0x11d
, without its leading monomial term. No need to run Berlekamp-Massey!
Technically we can use any message \( M_0 \); it doesn’t have to be the all-zero polynomial, we just need to compute two CRCs, one with the least significant bit flipped.
Now, we also need a way to distinguish the LSB-first and MSB-first cases, and for this, we need a consistency check. It looks like the minimum number of messages to determine the generator polynomial and whether it is LSB-first or MSB-first, is 3:
[00]
– zero polynomial[01]
– the difference between this and[00]
should be the tail of the generator polynomial, if MSB-first[80]
– the difference between this and[00]
should be the tail of the generator polynomial, if LSB-first
When I first looked at this approach, it seemed to me like the magic number was 5, not 3, because we need a way to confirm consistency of MSB-first or LSB-first, but the [01]
and [80]
messages can each be used for consistency with each other, because one of them produces a value of \( \bar{c}(x) \) that is \( x^7 \) times the other.
Add in the empty message []
for determining the initial and final values, and we should be able to get all the parameters after getting the CRCs for 4 messages. (That is, with one small caveat, which we’ll show in a minute.)
Let’s do it!
from libgf2.gf2 import _gf2exteuc, _gf2divmod class NoSolutionsError(ValueError): pass class MultipleSolutionsError(ValueError): pass def reverse_engineer_crc(crcfunc, n=None, trace=None, nmax=64): """ Reverse-engineer CRC using only four test polynomials (except in rare cases) """ z0 = crcfunc('') z1 = crcfunc('\x00') zM = crcfunc('\x01') zL = crcfunc('\x80') def get_consistent_qr(n, allow_multiple_solutions=False): n8 = (n+7)//8 # Check for a consistent solution. solutions = [] for lsbit_first, lsbyte_first in [(False,False), (False,True), (True,False), (True,True)]: if n8 == 1 and lsbyte_first: continue # don't reverse bytes for a 1-byte case polytail = bit_reverse_in_bytes(zL ^ z1, n8) if lsbit_first else (zM ^ z1) othertail = bit_reverse_in_bytes(zM ^ z1, n8) if lsbit_first else (zL ^ z1) if lsbyte_first: polytail = byte_reverse(polytail, n8) othertail = byte_reverse(othertail, n8) g = (1 << n) ^ polytail qr = GF2QuotientRing(g) if othertail == qr.lshiftraw(1, n+7): solutions.append((qr, lsbit_first, lsbyte_first)) if not solutions: raise NoSolutionsError("Could not find a matching polynomial") if allow_multiple_solutions: return solutions if len(solutions) > 1: raise MultipleSolutionsError("Found multiple consistent solutions: %s" % solutions) return solutions[0] autodetect = n is None if autodetect: n = max(z.bit_length() for z in [z0, z1, zM, zL]) # minimum size of CRC, we need to double-check this while True: if trace is not None: trace.append(n) n8 = (n+7)//8 try: solutions = get_consistent_qr(n, allow_multiple_solutions=True) except ValueError: n += 1 if n >= nmax: raise else: continue if len(solutions) > 1: # eliminate polynomials with no unity coefficient solutions = [solution for solution in solutions if (solution[0].coeffs & 1)] if len(solutions) > 1: raise MultipleSolutionsError("Still have multiple solutions: %s" % solutions) qr, lsbit_first, lsbyte_first = solutions[0] # We have a possible solution. Now we just have to run some extra checks # on some new patterns: # - za = CRC(all-zero message of length n8) # - zb = CRC(x^-1) # # Checks are: # - lengths of the CRCs of these two messages # are <= n8 (otherwise we increase candidate n and repeat) # - za ^ zb == 1 << (N-1) za = crcfunc('\x00' * (n8)) n_next = za.bit_length() if n_next > n: n = n_next continue # Use x^-1 -- its CRC should be za ^ (1 << (N-1)) onehalf = qr.rshiftraw1(1) bytecodes = [(onehalf >> (i-1)*8) & 0xff for i in xrange(n8,0,-1)] if lsbit_first: bytecodes = [bit_reverse(b,8) for b in bytecodes] Mb = ''.join(chr(b) for b in bytecodes) zb = crcfunc(Mb) n_next = zb.bit_length() if n_next > n: n = n_next continue cbar = za ^ zb if lsbyte_first: if lsbit_first: if cbar == 1: break else: if cbar == 0x80: break else: if lsbit_first: if cbar == 1 << (16*n8 - 8 - n): break else: if cbar == 1 << (n-1): break n += 1 if n > nmax: raise ValueError("Can't find solution") # no autodetect else: n8 = (n+7)//8 qr, lsbit_first, lsbyte_first = get_consistent_qr(n) # Undo the bit/byte reversing, so we can compute the polynomials zero_crcs = [optional_reverser(z,n,reverse_bits_in_bytes=lsbit_first, reverse_bytes=lsbyte_first) for z in [z0,z1]] poly = qr.coeffs # Now compute the initial and final values a = 0x101 << n d,u,v = _gf2exteuc(a,poly) cbar = zero_crcs[1] ^ zero_crcs[0] e,r = _gf2divmod(cbar,d) assert r == 0 init = qr.mulraw(u,e) final = qr.lshiftraw(init,n) ^ zero_crcs[0] return poly, lsbit_first, lsbyte_first, init, final for crcfunc in [mystery_crc_1, mystery_crc_2, mystery_crc_3, mystery_crc_4, mystery_crc_5, mystery_crc_6, mystery_crc_7, mystery_crc_8, mystery_crc_9]: crcparams = reverse_engineer_crc(crcfunc) crcparams = reverse_engineer_crc_bkm(crcfunc) print("poly=0x%05x, lsbit_first=%5s, lsbyte_first=%5s, init=0x%04x, final=0x%04x" % crcparams) print(" detected from %s" % crcfunc) if crcparams == (crcfunc.poly, crcfunc.lsbit_first, crcfunc.lsbyte_first, crcfunc.init, crcfunc.final): print(" Match!") else: print(" Mismatch... hmmm")
poly=0x0011d, lsbit_first=False, lsbyte_first=False, init=0x0056, final=0x0078 detected from MysteryCrc(poly=0x11d,init=0x56,final=0x78,lsbit_first=False,lsbyte_first=False) Match! poly=0x0011d, lsbit_first= True, lsbyte_first=False, init=0x0056, final=0x0078 detected from MysteryCrc(poly=0x11d,init=0x56,final=0x78,lsbit_first=True,lsbyte_first=False) Match! poly=0x00169, lsbit_first= True, lsbyte_first=False, init=0x0089, final=0x00ff detected from MysteryCrc(poly=0x169,init=0x89,final=0xff,lsbit_first=True,lsbyte_first=False) Match! poly=0x1100b, lsbit_first=False, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x1100b,init=0xffff,final=0xffff,lsbit_first=False,lsbyte_first=False) Match! poly=0x1100b, lsbit_first= True, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x1100b,init=0xffff,final=0xffff,lsbit_first=True,lsbyte_first=False) Match! poly=0x103dd, lsbit_first=False, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=False,lsbyte_first=False) Match! poly=0x103dd, lsbit_first= True, lsbyte_first=False, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=True,lsbyte_first=False) Match! poly=0x103dd, lsbit_first=False, lsbyte_first= True, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=False,lsbyte_first=True) Match! poly=0x103dd, lsbit_first= True, lsbyte_first= True, init=0xffff, final=0xffff detected from MysteryCrc(poly=0x103dd,init=0xffff,final=0xffff,lsbit_first=True,lsbyte_first=True) Match!
Now, I’ve handwaved over a few important things.
One is that we need to know \( N \), the size of the CRC in bits. If you know that, we’re done. Otherwise it’s not necessarily obvious from getting CRC values of these four cases. We can get a lower bound for \( N \) by looking at the maximum bit size of the CRC values returned by each of these four test polynomials. Unfortunately, in some cases it doesn’t necessarily tell you what \( N \) is.
A reliable way to verify the generating polynomial \( g(x) \) is correct along with its degree \( N \), is to compute \( \bar{c}(x) = f_m(M_0(x) + H(x)) + f_m(M_0(x)) \) for some arbitrary \( M_0(x) \) (the message of \( m = \lfloor N/8 \rfloor \) zeros will do), and the particular polynomial \( H(x) = x^{-1} \bmod g(x) \), because then \( \bar{c}(x) = x^Nx^-1 = x^{N-1} \) which is a CRC consisting of a 1 bit and \( N-1 \) zero bits.
My reverse_engineer_crc()
function does this, increasing the value of \( N \) until we get a verification of this type.
mystery_crc_10 = MysteryCrc(0x1000000af, init=0xde, final=0xad, lsbit_first=False, lsbyte_first=False)
trace_n = [] a,b,c,d,e=reverse_engineer_crc(mystery_crc_10, trace=trace_n) print('poly=%x lsbit_first=%s lsbyte_first=%s init=%x final=%x' % (a,b,c,d,e)) print("Sequence of N values tried during autodetection: %s" % trace_n)
poly=1000000af lsbit_first=False lsbyte_first=False init=de final=ad Sequence of N values tried during autodetection: [23, 30, 32]
The second handwavy thing is that we’re assuming that we can tell whether the CRC is LSB-first or not and whether the bytes are reversed or not, because if we just try all four possibilities, only one will work and the rest will not. But I don’t have any proof that this is true, that there aren’t any cases where we can get consistent results for both LSB-first and MSB-first interpretations of the CRC. I’m not going to worry about that. If you find such a counterexample, let me know.
There’s a third handwavy thing; we’ll get to it in a minute.
CRCs in the Wild
So far we’ve been analyzing CRC oracles that we created. That’s a good start, but let’s try using these techniques on real systems that are out there in the wild.
We’ll start with zlib.crc32
in the Python standard library.
import zlib def crc32_zlib(x): return zlib.crc32(x) & 0xffffffff a,b,c,d,e = reverse_engineer_crc(crc32_zlib) print('poly=%x lsbit_first=%s lsbyte_first=%s init=%x final=%x' % (a,b,c,d,e))
poly=104c11db7 lsbit_first=True lsbyte_first=True init=46af6449 final=ffffffff
The zlib
module’s CRC32 algorithm is the same as the one used in Ethernet, PNGs, and PKZIP. If you look at the PNG code for CRC, you’ll see that it is initialized with 0xffffffff
and has a final XOR value of 0xffffffff
. Why doesn’t the initial value match?
Part of the reason is a difference in definition. Our definition is that we calculate \( c(x) = \left(x^{N+8m}p_i(x) + x^N M(x) + p_f(x)\right) \bmod g(x) \) — effectively, the initial value \( p_i(x) \) is a phantom input concatenated with the message bytes, and it gets shifted \( N \) times before the message bytes get shifted in. We can avoid this shift (which the PNG code does) by pre-shifting the initial value. If we do this, it can no longer be considered a concatenation with the message bytes, but in the case of the PNG code it does lead to a simple value:
qr = GF2QuotientRing(0x104c11db7) '%8x' % qr.lshiftraw(0x46af6449,32)
'ffffffff'
The other reason is that we are using a CRC convention where we always shift left, and in transforming the CRC32 algorithm — which is constructed using shift-right operations — into this form, we had to reverse the polynomial, as I mentioned earlier, from 0x1db710641
to 0x104c11db7
, and the initial value, depending on what form it is expressed in, can change as a result. I suspect that in the form we use, where we precede the message by phantom content, the initial value does not change if we swap implementations to use a right-shift with a reversed polynomial and reversed output bits, because both implementations produce the same output from the same input — but I’m not completely sure.
Now let’s have some more fun and see what happens when we treat Lammert Bies’s website as the oracle. You can type in stuff by hand if you like, but I’m going to automate it.
import urllib import urllib2 from IPython.core.display import HTML def fetch_crcs_from_lammerts_website(msg, use_ascii=False): """ Get CRCs from this URL: https://www.lammertbies.nl/comm/info/crc-calculation.php?crc=%22HEXDIGITS%22&method=hex where HEXDIGITS is replaced by the message, encoded as hex. This returns an HTML table. """ url = 'https://www.lammertbies.nl/comm/info/crc-calculation.php' response = urllib2.urlopen(url + '?' + urllib.urlencode(dict( crc=msg if use_ascii else ('"' + msg.encode('hex') + '"'), method='ascii' if use_ascii else 'hex'))) try: return response.read() finally: response.close() HTML(fetch_crcs_from_lammerts_website('\x00\x01'))
"0001" (hex) | |
---|---|
1 byte checksum | 1 |
CRC-16 | 0xC0C1 |
CRC-16 (Modbus) | 0x70C0 |
CRC-16 (Sick) | 0x0100 |
CRC-CCITT (XModem) | 0x1021 |
CRC-CCITT (0xFFFF) | 0x0D2E |
CRC-CCITT (0x1D0F) | 0x94E1 |
CRC-CCITT (Kermit) | 0x8911 |
CRC-DNP | 0xA1C9 |
CRC-32 | 0x36DE2269 |
from bs4 import BeautifulSoup def parse_rows_from_lammerts_website(html): """ Return the CRC and name from HTML in the form used in Lammert's website. The first row is the header and the second is a useless checksum. """ soup = BeautifulSoup(html, 'lxml') rows = soup.find_all('tr') return [(int(row.find_next('td').find_next('td').b.text, 16), row.find_next('td').text) for row in rows[2:]] for crc, name in parse_rows_from_lammerts_website( fetch_crcs_from_lammerts_website('\x00\x01')): print '%x %s' % (crc, name)
c0c1 CRC-16 70c0 CRC-16 (Modbus) 100 CRC-16 (Sick) 1021 CRC-CCITT (XModem) d2e CRC-CCITT (0xFFFF) 94e1 CRC-CCITT (0x1D0F) 8911 CRC-CCITT (Kermit) a1c9 CRC-DNP 36de2269 CRC-32
def create_oracle_from_lammerts_website(crcnum, apply_name=False): def oracle(msg, include_name=False): row = parse_rows_from_lammerts_website( fetch_crcs_from_lammerts_website(msg))[crcnum] if apply_name: oracle.__name__ = str(row[1]) return row if include_name else row[0] return oracle crc32_lammert = create_oracle_from_lammerts_website(8) print '%08x' % crc32_lammert('\x00\x01')
36de2269
a,b,c,d,e=reverse_engineer_crc(crc32_lammert) print("poly=%x, lsbit_first=%s, lsbyte_first=%s, init=%x, final=%x" % (a,b,c,d,e))
poly=104c11db7, lsbit_first=True, lsbyte_first=True, init=46af6449, final=ffffffff
Like Robinson Crusoe, It’s Primitive as Can Be
Now back to the third handwavy thing that we’ve glossed over as we reverse-engineered the CRC, and it’s something we will often encounter when working with CRC16 variants.
I’m going to use some code for the 16-bit X.25 CRC that I covered in an earlier article.
class CRC_X25(object): def __init__(self): poly = 0x1021 self.poly = poly self.brevtable, self.table = self.create_tables() def create_tables(self): brevtable = [] # Bit-reverse table for bytes table = [] # Each element j of the table has a value # of x^16*p{j} mod poly, where p{j} is # the GF(2) polynomial interpretation # of the bits of integer j for j in xrange(256): bv0 = j brev = 0 bval = j << 8 for k in xrange(8): bval = self._left_shift(bval) brev <<= 1 if bv0 & 1: brev |= 1 bv0 >>= 1 table.append(bval) brevtable.append(brev) return tuple(brevtable), tuple(table) def _left_shift(self, bval): bval <<= 1 if bval & 0x10000: bval ^= 0x10000 bval ^= self.poly return bval def calc_bitwise(self, data): crc = 0xFFFF for b in data: bval = ord(b) for k in xrange(8): crc = self._left_shift(crc) if bval & (1 << k): crc ^= self.poly crc ^= 0xFFFF return chr(self.brevtable[crc >> 8]) + chr(self.brevtable[crc & 0xff]) def calc_table(self, data): crc = 0xFFFF for b in data: bval = self.brevtable[ord(b)] crc = ((crc << 8) & 0xFFFF) ^ self.table[((crc >> 8) ^ bval) & 0x00FF]; crc ^= 0xFFFF return chr(self.brevtable[crc >> 8]) + chr(self.brevtable[crc & 0xff]) def __call__(self, data): return self.calc_table(data) crc_x25_obj = CRC_X25() def crc_x25(message): crc_binary = crc_x25_obj(message) y = 0 for c in crc_binary: y <<= 8 y ^= ord(c) return y z0 = crc_x25('') z1 = crc_x25('\x00') zM = crc_x25('\x01') zL = crc_x25('\x80') for z,name in [(z0,'z0'), (z1,'z1'), (zM,'zM'), (zL,'zL')]: print '%s=%04x' % (name,z) print("zL ^ z1 = %04x" % (z1^zL)) print("bit-reversed within bytes: %04x" % bit_reverse_in_bytes(z1^zL,2))
z0=0000 z1=78f0 zM=f1e1 zL=7074 zL ^ z1 = 0884 bit-reversed within bytes: 1021
We can get the polynomial with no problem; the tail 0x1021
(entire polynomial 0x11021
) is consistent with the generator polynomial \( g(x) = x^{16} + x^{12} + x^5 + 1 \) mentioned in X.25 section 2.2.7.4.
The problem comes when we try to tease out the initial and final values.
g = 0x11021 qr = GF2QuotientRing(g) n = qr.degree a = 0x101 cbar = bit_reverse_in_bytes(z1^z0,2) init = qr.divraw(cbar,qr.lshiftraw(a,n)) init
--------------------------------------------------------------------------- ValueError Traceback (most recent call last)in () 4 a = 0x101 5 cbar = bit_reverse_in_bytes(z1^z0,2) ----> 6 init = qr.divraw(cbar,qr.lshiftraw(a,n)) 7 init /Users/jmsachs/storage/jms/hobbies/blogs/embedded/repo/scripts/libgf2/libgf2/gf2.pyc in divraw(self, a, b) 774 If `b` and the characteristic polynomial have a common factor. 775 """ --> 776 return self.mulraw(a,self.invraw(b)) 777 def div(self, a, b): 778 """ /Users/jmsachs/storage/jms/hobbies/blogs/embedded/repo/scripts/libgf2/libgf2/gf2.pyc in invraw(self, a) 730 (r,y,_) = _GF2.exteuc(a,self.coeffs) 731 if r != 1: --> 732 raise ValueError('%x and %x are not relatively prime but have a common factor of %x' % (a,self.coeffs,r)) 733 return y 734 def inv(self, a): ValueError: 2310 and 11021 are not relatively prime but have a common factor of 3
It turns out that \( g(x) = x^{16} + x^{12} + x^5 + 1 \) is not a primitive polynomial: we can factor \( g(x) = (x+1)(x^{15}+x^{14}+x^{13}+x^{12}+x^4+x^3+x^2+x+1) \) and this \( (x+1) \) factor is a mess. The best we can do to isolate the initial and final values is to XOR the CRC of the empty message with the CRC of the zero message of \( k \) bytes, and this produces values of the form \( \bar{c}(x) = x^N(x^{8k} + 1)p_i(x) \). So we’d like to divide out the \( x^{8k} + 1 \) factor, but no matter what value of \( k \) we choose, it’s divisible by \( x+1 \), so it has no multiplicative inverse modulo \( g(x) \), and we get the error above if we try to calculate it.
(In technical terms: when \( g(x) \) is not a primitive polynomial, then \( GF(2)[x]/g(x) \) is a quotient ring but not a finite field, and not all nonzero elements have multiplicative inverses. In my libgf2
package, libgf2.gf2.GF2QuotientRing
can be used to compute in these quotient rings whether or not they are fields, although for methods that involve multiplicative inverses, some of the assumptions break down.)
To get past this, we have to pull out a little bit of grungy algebra again.
We’d like to solve \( \bar{c} = ap_i \bmod g \) for \( p_i \), where \( a = (x^{8k} + 1)x^N \). (I’m dropping the \( (x) \) part for a moment — these are all polynomials) This may or may not have a solution, and if it does, it’s equivalent to finding \( p_i \) and \( z \) such that \( \bar{c} = ap_i + gz \). In Part III we looked at the extended Euclidean algorithm, which, given \( a \) and \( b \), calculates \( u \) and \( v \) such that \( au+bv = \gcd(a,b) \). This looks very similar, and in fact, if our equation does have a solution, we can calculate it as follows:
$$\begin{aligned} d &= \gcd(a,g) = au+gv \cr e &= \bar{c}/d \cr p_i &= ue \end{aligned}$$
with \( d, u, \) and \( v \) coming from the extended Euclidean algorithm, because then we have \( ap_i + gve = aue + gve = de = \bar{c} \).
If \( \bar{c}(x) \) is divisible by \( d(x) \), we have solutions. That’s multiple solutions: we can use \( e(x) = \bar{c}(x)/d(x) + wg(x)/d(x) \) for any \( w \), because in the calculation of \( \bar{c}(x) = d(x)e(x) \) we will find that the \( w \) term is zero \( \bmod g(x) \).
If \( \bar{c}(x) \) is not divisible by \( d(x) \), there are no solutions — and we probably made an error in calculating the CRC.
Enough with the algebra, let’s use it for computation.
from libgf2.gf2 import _gf2mul, _gf2divmod, _gf2exteuc a = 0x101 << n d,u,v = _gf2exteuc(a,g) e,r = _gf2divmod(cbar,d) assert r == 0 init1 = qr.mulraw(u,e) # Let's find another possible solution with w=1 gcofactor, r = _gf2divmod(g,d) assert r == 0 init2 = init1 ^ gcofactor final1 = z0 ^ qr.lshiftraw(init1, n) final2 = z0 ^ qr.lshiftraw(init2, n) crc_x25_dup1 = MysteryCrc(g, init=init1, final=final1, lsbit_first=True, lsbyte_first=False) crc_x25_dup2 = MysteryCrc(g, init=init2, final=final2, lsbit_first=True, lsbyte_first=False) for crcfunc in [crc_x25, crc_x25_dup1, crc_x25_dup2]: print crcfunc for message in ['','Hi', 'Hello', 'Ernie, you have a banana in your ear!']: print " message='%s'" % message print " crc=%04x" % crcfunc(message)
<function crc_x25 at 0x10ab60d70> message='' crc=0000 message='Hi' crc=2679 message='Hello' crc=2c54 message='Ernie, you have a banana in your ear!' crc=3fc0 MysteryCrc(poly=0x11021,init=0x84cf,final=0xffff,lsbit_first=True,lsbyte_first=False) message='' crc=0000 message='Hi' crc=2679 message='Hello' crc=2c54 message='Ernie, you have a banana in your ear!' crc=3fc0 MysteryCrc(poly=0x11021,init=0x74d0,final=0xfe0,lsbit_first=True,lsbyte_first=False) message='' crc=0000 message='Hi' crc=2679 message='Hello' crc=2c54 message='Ernie, you have a banana in your ear!' crc=3fc0
Perfect!
The reverse_engineer_crc()
function can do this for us (although it doesn’t calculate alternate solutions):
a,b,c,d,e=reverse_engineer_crc(crc_x25) print("poly=%x, lsbit_first=%s, lsbyte_first=%s, init=%x, final=%x" % (a,b,c,d,e))
poly=11021, lsbit_first=True, lsbyte_first=False, init=84cf, final=ffff
Now let’s try some more from lammertbies.nl:
def reverse_engineer_crc_from_lammerts_website(crcnum, name=None): oracle = create_oracle_from_lammerts_website(crcnum, apply_name=name is None) a,b,c,d,e=reverse_engineer_crc(oracle) oracle_mimic = MysteryCrc(a,lsbit_first=b,lsbyte_first=c, init=d,final=e, name=oracle.__name__ if name is None else name) return oracle_mimic message = "Ernie, you have a banana in your ear!" for row in xrange(9): if row == 2: continue # Row 2 'CRC-16 (Sick)' didn't work for some reason crc_mimic = reverse_engineer_crc_from_lammerts_website(row) print "row=%d: %s" % (row, crc_mimic) print "crc(%s) = %x" % (message, crc_mimic(message))
row=0: MysteryCrc(poly=0x18005,init=0x0,final=0x0,lsbit_first=True,lsbyte_first=True,name='CRC-16') crc(Ernie, you have a banana in your ear!) = 4d55 row=1: MysteryCrc(poly=0x18005,init=0xeaa8,final=0x0,lsbit_first=True,lsbyte_first=True,name='CRC-16 (Modbus)') crc(Ernie, you have a banana in your ear!) = bd44 row=3: MysteryCrc(poly=0x11021,init=0x0,final=0x0,lsbit_first=False,lsbyte_first=False,name='CRC-CCITT (XModem)') crc(Ernie, you have a banana in your ear!) = e2e8 row=4: MysteryCrc(poly=0x11021,init=0x84cf,final=0x0,lsbit_first=False,lsbyte_first=False,name='CRC-CCITT (0xFFFF)') crc(Ernie, you have a banana in your ear!) = 5641 row=5: MysteryCrc(poly=0x11021,init=0xffff,final=0x0,lsbit_first=False,lsbyte_first=False,name='CRC-CCITT (0x1D0F)') crc(Ernie, you have a banana in your ear!) = 14e2 row=6: MysteryCrc(poly=0x11021,init=0x0,final=0x0,lsbit_first=True,lsbyte_first=False,name='CRC-CCITT (Kermit)') crc(Ernie, you have a banana in your ear!) = edaa row=7: MysteryCrc(poly=0x13d65,init=0x0,final=0xffff,lsbit_first=True,lsbyte_first=False,name='CRC-DNP') crc(Ernie, you have a banana in your ear!) = 448d row=8: MysteryCrc(poly=0x104c11db7,init=0x46af6449,final=0xffffffff,lsbit_first=True,lsbyte_first=True,name='CRC-32') crc(Ernie, you have a banana in your ear!) = c3c2cd3f
There, we figured out all the different varieties. (Except row 2, CRC-16 (Sick), which seemed to find the polynomial \( x^{16} + 1 \) when I tried it, and caused further errors. I’m going to have to let that one go.)
HTML(fetch_crcs_from_lammerts_website(message, use_ascii=True))
"Ernie, you have a banana in your ear!" | |
---|---|
1 byte checksum | 193 |
CRC-16 | 0x4D55 |
CRC-16 (Modbus) | 0xBD44 |
CRC-16 (Sick) | 0x8DB1 |
CRC-CCITT (XModem) | 0xE2E8 |
CRC-CCITT (0xFFFF) | 0x5641 |
CRC-CCITT (0x1D0F) | 0x14E2 |
CRC-CCITT (Kermit) | 0xEDAA |
CRC-DNP | 0x448D |
CRC-32 | 0xC3C2CD3F |
There’s not much else to say here. We’ve shown how to compute the parameters of a CRC just from looking at its output if we control the input. There are also ways to compute the CRC parameters in some cases where the input is constrained — for example, if you can only send messages of the form CA FE BA BE DE AD BE EF xx yy zz ww 60 0D F0 0D
but you can choose any values for the xx yy zz ww
bytes — although I won’t be covering them, and it’s a good exercise to try after reading this article.
That’s all the time we have for today!
Wrapup
We looked at reverse-engineering CRCs. In gory detail. With finite field algebra. I would like to say this wasn’t that complicated to understand, but there were some subtleties along the way that made the subject… interesting — and I hope it wasn’t too far out of reach to follow.
In a nutshell, here are the areas this article covered:
- The parameters needed to characterize a CRC, in particular:
- the generating polynomial \( g(x) \)
- whether the bits in each byte are interpreted (and output) least-significant-bit first
- whether the bytes are output least-significant-byte first
- the initial value \( p_i(x) \) prepended to the input data
- the final value \( p_f(x) \) XORed after the input data has been processed
- The finite field equivalent of the CRC calculation, namely \( c(x) = f_m(M(x)) = \left(x^{N+8m}p_i(x) + x^N M(x) + p_f(x)\right) \bmod g(x) \) where \( M(x) \) is the input data, expressed as a polynomial with one coefficient for each bit
- A simple Python class (
MysteryCrc
) that can accommodate all these variants — although I should note that I only tested it for polynomials that have a degree of a multiple of 8, so if you are working with a 17-bit CRC don’t expect it to work for you - A CRC is effectively an affine function (which is a pure linear function with a fixed offset) for fixed-length input data
- Information can be probed from a CRC “oracle” by using basis functions consisting of all-zero messages (to help eliminate the affine offsets in subsequent computations) along with messages of a fixed length that are all zeros except for a single
1
bit. These single-one-bit messages have a finite field representation of monomials — for example the message that has a1
bit in bit 14 can be represented as \( x^{14} \). (How you count bits depend on whether the CRC parameters are least-significant-bit first.) - The generating polynomial can be discovered by using a sequence of messages of successive powers of \( x \), and using a single bit of the resulting CRC to form a bit sequence, which can be fed into the Berlekamp-Massey algorithm. This requires \( 2N \) messages, in addition to an all-zero message, to determine a polynomial of degree \( N \).
- Once the generating polynomial is known, initial and final values can be determined by submitting the empty message (no bytes) and the single-byte
00
message, and using some fairly simple finite field arithmetic - The two polarity parameters (least-significant-bit first and least-significant-byte first) don’t have any shortcut aside from trying all possibilities and checking consistency.
- Berlekamp-Massey isn’t necessary to determine the polynomial; any two messages that differ in the least significant bit are sufficient to find the generating polynomial, by XORing the CRC of each message. Full parameter determination in the case of unknown bit polarity requires at least four messages, the empty message, and the three single-byte messages
00
,01
, and80
. There are cases if the CRC length is not known where additional CRCs must be computed to confirm CRC length. - We used some real-world test cases: CRC-32 from
zlib.crc32
, the X.25 CRC discussed in another article, and the CRCs calculated on Lammert Bies’s website. - We discussed how to deal with generating polynomials that are non-primitive — these may have multiple solutions for initial/final value parameters, and require use of the extended Euclidean algorithm to get around the fact that quotient rings (unlike finite fields) do not have inverses for all nonzero elements.
Next time we’ll show how to find lots of primitive polynomials.
© 2018 Jason M. Sachs, all rights reserved.
- Comments
- Write a Comment Select to add a comment
For what it's worth, the Linux documentation also has an interesting description of CRC computation (although not on reverse-engineering):
https://github.com/torvalds/linux/blob/master/Docu...
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: