# Linear Feedback Shift Registers for the Uninitiated, Part VI: Sing Along with the Berlekamp-Massey Algorithm

October 18, 2017

The last two articles were on discrete logarithms in finite fields — in practical terms, how to take the state $S$ of an LFSR and its characteristic polynomial $p(x)$ and figure out how many shift steps are required to go from the state 000...001 to $S$. If we consider $S$ as a polynomial bit vector such that $S = x^k \bmod p(x)$, then this is equivalent to the task of figuring out $k$ from $S$ and $p(x)$.

This time we’re tackling something different: how to determine the characteristic polynomial $p(x)$ from the LFSR’s output. There’s a very simple algorithm called the Berlekamp-Massey algorithm, that will do this. That is, the Berlekamp-Massey algorithm is very simple to implement. It’s not so easy to explain why it works. I’ll be able to explain most of it. You can try reading James Massey’s original paper for the rest, but it’s heavy on the abstract algebra, at least for an amateur mathematician. Theorems! Bah! Those are so 17th-century....

## Sing Along with $H_{25}$ and $H_D$

Okay, let’s go back to a 5-bit LFSR that we’ve seen before, with characteristic polynomial $p(x) = x^5 + x^2 + 1$. This LFSR can be represented by the quotient ring $H_{25} = GF(2)[x]/(x^5 + x^2 + 1)$:

You know how the output goes, right? Sing along when you remember the tune....

from libgf2.gf2 import GF2QuotientRing

def singalong(field, nbits, initial_state=1):
if field.degree == 0:
return ('1' if initial_state == 1 else '0') * nbits + '...'
mask = 1 << (field.degree - 1)
e = field.wrap(initial_state)
return ''.join(['1' if (e<<k).coeffs & mask else '0' for k in xrange(nbits)])+'...'

H25 = GF2QuotientRing(0x25)
print singalong(H25, 72)
000010010110011111000110111010100001001011001111100011011101010000100101...


That’s right, it repeats every $2^5-1 = 31$ bits.

Doesn’t ring a bell? Okay, let’s try a simpler one, $H_D = GF(2)[x]/(x^3 + x^2 + 1)$:

HD = GF2QuotientRing(0xD)
print singalong(HD, 72)
print singalong(HD, 72, initial_state=4)
001110100111010011101001110100111010011101001110100111010011101001110100...
111010011101001110100111010011101001110100111010011101001110100111010011...


That one repeats every $2^3-1=7$ bits. 0011101 0011101

Or if you start with $x^2$ as the initial state, 1110100 1110100

Now let’s say someone comes up to us and says they have this LFSR and its output looks like this:

11101000

Looks good until the 8th bit. Uh oh. Now what?

## Discrepancy!

Let’s take that polynomial $p(x) = x^3 + x^2 + 1$ again. It can’t be the polynomial that produces 11101000, unfortunately. But it was pretty close; it worked up all the way to 1110100. Maybe we can fix it. Let’s start by running the bits $y_0, y_1, y_2, \ldots y_n$ through it, in order, using an equation… hmm, what should we use? Well, the feedback equation of this LFSR is $y_n = y_{n-1} + y_{n-3}$, where the addition happens modulo 2. Let’s define $d[n]$ as the discrepancy between the actual bit output and the expected bit output, $d[n] = y_n - (y_{n-1} + y_{n-3}) = y_{n} + y_{n-1} + y_{n-3}$:

• $n=0: 1$ (not enough bits yet)
• $n=1: 1$ (not enough bits yet)
• $n=2: 1$ (not enough bits yet)
• $n=3: d=0 + 1 + 1 = 0$
• $n=4: d=1 + 0 + 1 = 0$
• $n=5: d=0 + 1 + 1 = 0$
• $n=6: d=0 + 0 + 0 = 0$
• $n=7: d=0 + 0 + 1 = 1$ → we have a discrepancy!

For reasons that may become clear soon, we’re going to reverse the order of the polynomial coefficients to create what Massey called the connection polynomial $C(x) = x^3p(x^{-1})$ which is $x^3 + x + 1$ in this case, and write the discrepancy more generally as $$d(C(x),n) = \sum\limits_{j=0}^L c_jy_{n-j} = c_0y_{n} + c_1y_{n-1} + c_2y_{n-2} + \ldots + c_Ly_{n-L}$$ where L is the number of shift cells, and the $c_j$ coefficients are the coefficients of this “inverted” characteristic polynomial $$C(x) = c_Lx^L + c_{L-1}x^{L-1} + \ldots + c_1 x + c_0$$

Our not-quite-adequate three-bit LFSR ($L=3$) has the connection polynomial $C(x) = x^3 + x + 1$ with $c_0 = c_1 = c_3 = 1$ and $c_2 = 0$, so in this case

\begin{align} d(C(x),n) &= c_0y_{n} + c_1y_{n-1} + c_2y_{n-2} + c_3y_{n-3} \cr &= 1\cdot y_{n}+1\cdot y_{n-1}+0\cdot y_{n-2} + 1\cdot y_{n-3} \end{align}

and we can rewrite our discrepancy calculations as

• $n=0: y_{0} = 1$ (not enough bits yet)
• $n=1: y_{1} = 1$ (not enough bits yet)
• $n=2: y_{2} = 1$ (not enough bits yet)
• $n=3: y_{3} = 0 \rightarrow d(C(x),3) = 1\cdot 0 + 1\cdot 1 + 0\cdot 1 + 1\cdot1 = 0$
• $n=4: y_{4} = 1 \rightarrow d(C(x),4) = 1\cdot 1 + 1\cdot 0 + 0\cdot 1 + 1\cdot1 = 0$
• $n=5: y_{5} = 0 \rightarrow d(C(x),5) = 1\cdot 0 + 1\cdot 1 + 0\cdot 0 + 1\cdot1 = 0$
• $n=6: y_{6} = 0 \rightarrow d(C(x),6) = 1\cdot 0 + 1\cdot 0 + 0\cdot 1 + 1\cdot0 = 0$
• $n=7: y_{7} = 0 \rightarrow d(C(x),7) = 1\cdot 0 + 1\cdot 0 + 0\cdot 0 + 1\cdot1 = 1$ → discrepancy!

The way the Berlekamp-Massey algorithm works is to figure out how to reconcile an LFSR polynomial that has a discrepancy with the stream of bits, by updating this polynomial to one of minimal degree that makes the discrepancy zero. We have a cubic polynomial $C(x) = x^3 + x^2 + 1$ that worked from $n=3$ until we got to $n=7$, and now we have to figure out a new polynomial that will suffice, and make the discrepancy zero, so we have to determine $c_7, c_6, c_5, c_4, c_3, c_2, c_1, c_0$ such that

$$d(C(x),7) = 0 = c_0y_7 + c_1y_6 + c_2y_5 + c_3y_4 + c_4y_3 + c_5y_2 + c_6y_1 + c_7y_0$$

Note that these coefficients are generally different than the previous set of original coefficients $c_3, c_2, c_1, c_0$. In Massey’s original paper, he used the polynomial $C^{(n)}(D)$ to denote the set of coefficients used at the $n$th step (with $n+1$ input bits) to compute the discrepancy, where $D$ is an indeterminate, essentially acting like the delay operator $z^{-1}$ in digital signal processing. Since we’re using $x$ as the polynomial variable rather than $D$, we’ll rewrite as $C^n(x)$ just to make things simpler. In this example, $C^4(x) = C^5(x) = C^6(x) = C^7(x) = x^3 + x + 1$, and the task is to find $C^8(x) \neq C^7(x)$.

Massey’s implementation of the algorithm uses the polynomial $B(x) = C^{m}(x)$ for some value of $m$, where $B(x)$ was the previous polynomial of smaller degree, that was replaced after step $m$. In this example, let’s just say that we know $B(x) = C^3(x) = x+1$ which was evaluated at step $m=3$ and then replaced by $C^4(x) = x^3 + x + 1$.

Now here’s the insight:

We can construct a new $C^{n+1}(x) = C^{n}(x) + x^{n-m}B(x)$. In our case we have $n=7, m=3$ so $C^8(x) = C^7(x) + x^4B(x) = x^3 + x + 1 + x^4(x+1) = x^5 + x^4 + x^3 + x + 1$.

The reason this works to create a new polynomial with discrepancy 0, is that we know that at $n=7$, $C^7(x)$ has a discrepancy of 1, and if we look at the discrepancy of $x^4B(x) = x^5 + x^4$ it’s $0 \cdot y_7 + 0\cdot y_6 + 0\cdot y_5 + 0\cdot y_4 + b_0 y_3 + b_1 y_2$ where the first 4 terms have coefficients 0 because we multiplied by $x^4$. But $b_0y_3 + b_1y_2$ is the discrepancy of $B(x)$ at $n=3$, which we know was 1 because it didn’t work and we had to replace it by $C^3(x)$. So we have two polynomials that each have discrepancy of 1, and that means if we add them together we get a discrepancy of 0. (Remember, we’re working with polynomials in the ring $GF(2)[x]$ so arithmetic on their coefficients is modulo 2.)

In general, what we do in the Berlekamp-Massey algorithm is to calculate the discrepancy $d(C^n(x),n)$ at step $n$, and then update the connection polynomial as follows:

$$C^{n+1}(x) = \begin{cases} C^{n}(x) & d(C^n(x),n) = 0 \cr C^{n}(x) + x^{n-m}C^{m}(x) & d(C^n(x),n) = 1 \end{cases}$$

where $m$ is the step at which the polynomial of previously-smaller degree had $d(C^m(x),m) = 1$; in the second case where $d(C^n(x),n) = 1$ and we had to add $x^{n-m}C^m(x)$, it’s fairly easy to show that $d(x^{n-m}C^m(x),n) = d(C^m(x),m) = 1$ — one calculation is just a time-shift of the other — and since the discrepancy is distributive ($d(C_1(x) + C_2(x),n) = d(C_1(x),n) + d(C_2(x),n)$ with addition mod 2) then that forces $d(C^{n+1}(x), n) = 0)$ in both cases.

We could also just write it as follows, giving us a potentially constant-time algorithm that “wastes” some computational effort if $d=0$:

$$C^{n+1}(x) = C^{n}(x) + x^{n-m}C^{m} d(C^n(x),n)$$

There are three minor issues here that we’ve glossed over, regarding the value of $L_n$, which is the number of bit cells in the shift register implementation, and which is also an upper bound on the degree of the polynomial $C^n(x)$.

One is this “upper bound” business. We could have a shift register with $L=3$ and a connection polynomial $C(x) = 1$. What does this look like? Essentially it’s a shift register with no feedback; the output is equal to the state, followed by an infinite string of zeros, so if the shift register contained initial state of 101 then the output would be 101000000.... I would call this a “degenerate” LFSR, something that’s not really useful in practice, but if we’re just trying to match a given finite output sequence, then it’s certainly a valid choice. In general, there are $L - \operatorname{deg} C(x)$ cells in the LFSR that don’t receive any feedback; if this number is greater than zero, then these cells contain initial conditions that get quickly replaced by zero bits. So normally the degree of the polynomial and the LFSR length are the same.

The second is how to update the value $L_n$, the size of the shift register with connection polynomial $C^n(x)$. It is possible to show (which I won’t do; you’ll have to read Massey’s paper or Berlekamp’s paper or some other reference on finite fields or LFSRs) that

$$L_{n+1} = \begin{cases} n+1 - L_n& n < 2L_n \cr L_n & n \geq 2L_n \end{cases}$$

The third is that if $n < 2L_n$, then the polynomial $C^{n}(x)$ is not unique. In fact, we can construct $C’(x) = C^{n}(x) + Q(x)C^m(x)x^{n-m}$ for any $Q(x)$ that has degree less than $2L_n-n$. If $n \geq 2L_n$ then there are enough bits to prove that $C^{n}(x)$ is the unique polynomial of smallest degree that can be used in an LFSR to generate the set of output bits. (Again, I’m not going to prove this; if you’re interested, see Massey’s paper.)

In our example, we have $C^8(x) = x^5 + x^4 + x^3 + x + 1$ with $L_8 = 8 - L_7 = 5, n = 8, m = 7,$ and $C^m(x) = x^3 + x + 1$. But since $8 < 2L_8 = 10$, we can take any $Q(x)$ with degree less than $10-8 = 2$ and multiply by $xC^7(x) = x^4 + x^2 + x$ and add this to $C^8(x)$ and it is a valid connection polynomial as well. For our choices $Q(x) = 1, Q(x) = x, Q(x) = x+1$ this yields $x^5 + x^3 + x^2 + 1$, $x^4 + x^2 + x + 1$, and $1$.

Here’s a demonstration of this, in Python:

for S0, p in [
(0x11, 0b110111),   # x^5 + x^4 + x^3 + x + 1
(0x19, 0b101101),   # x^5 + x^3 + x^2 + 1
(0x15, 0b111010),   # x^4 + x^2 + x + 1
(0x1d, 0b100000),   # 1
]:
print singalong(GF2QuotientRing(p), 30, initial_state=S0)

111010001001010110000111001101...
111010000101111010000101111010...
111010001101000110100011010001...
111010000000000000000000000000...


You’ll note that the bit strings all start with 11101000 and then contain all four possibilities 00, 01, 10, and 11 for the following two bits.

At the next step $n=9$, where the output is 111010001 and $C^9(x) = C^8(x) = x^5 + x^4 + x^3 + x + 1, L_9 = L_8 = 5, m=7,$ and $C^m(x) = x^3 + x + 1$. Here $2L-n = 1$, so the only valid choices of $Q(x)$ are $0$ and $1$, so this allows only $C^9(x)$ and $C^9(x) + C^7(x)x^2 = x^4 + x^2 + x + 1$, which are the first and third rows of the bit strings above.

(I should also note, by the way, that the full Berlekamp-Massey algorithm can work with a series of shift register outputs taking on values that are elements of any field, not just $GF(2)$. In this article we’re only talking about the binary version of Berlekamp-Massey; the non-binary version requires the calculation of the multiplicative inverse, but the full algorithm shares the same idea: use two polynomials with known nonzero discrepancy to construct a third polynomial that has zero discrepancy.)

Anyway, with all this as background, here’s a Python implementation of Berlekamp-Massey, along with a helper routine that prints out HTML tables in an IPython notebook.

from libgf2.util import parity

def bkm(bits, sink=None):
"""
Berlekamp-Massey algorithm

sink: used to store intermediate results for debugging or analysis
"""
m = -1
c = 1
b = 1
L = 0
fullbits = 0
nbits = len(bits)
reversed_bits = 0
for n in xrange(nbits):
reversed_bits = (reversed_bits << 1) | (1 if bits[n] else 0)
d = parity(reversed_bits&c)
if sink is not None:
sink.append(dict(n=n, m=m, L=L, b=b, c=c, d=d,
reversed_bits=reversed_bits))
if d == 1:
# Discrepancy!
cprev = c
c ^= b << (n-m)
if 2*L <= n:
L = n+1 - L
b = cprev
m = n
if sink is not None:
sink.append(dict(n=None, m=m, L=L, b=b, c=c))
return c,L

### The rest of this code is just to visualize each step in a table.

from IPython.core.display import display, HTML

def polylatex(coeffbits, reverse=False):
coeffs = []
y = coeffbits
n = 0
while y != 0:
coeffs.append(y & 1)
y >>= 1
n += 1
n -= 1

return ('$' + '+'.join('x^{%d}' % (n-j) if (n-j) > 1 else 'x' if (n-j) == 1 else '1' for j,c in enumerate(coeffs if reverse else coeffs[::-1]) if c != 0) + '$')

def bkm_html(bits, css=''):
sink = []
c,L = bkm(bits, sink)
html = """<div><style type='text/css' scoped='scoped'>
code span.mask { text-decoration: underline; text-decoration-color: red; }
td.bits { text-align: left; }""" + css + """
</style><table><th>$n$</th><th>$m$</th><th>$n-m$</th><th>$L$</th>
<th>$C^{n}(x)$</th><th>$C^m(x)$</th><th>bits</th><th>$d$</th>"""
for row in sink:
row.update(C=polylatex(row['c']),
B=polylatex(row['b']))
n = row['n']
if n is not None:
row['bitwidth'] = n+1
row['x'] = n-row['m']
rbits = row['reversed_bits']
styledbits = ''
for k in xrange(n+1):
styledbits = (('<span class="mask">%d</span>' if (bmask >> k) & 1 else '%d')
% ((rbits >> k) & 1)
+ styledbits)

html += ("""<tr><td>{n}</td><td>{m}</td><td>{x}</td><td>{L}</td><td>{C}</td><td>{B}</td>
<td class="bits"><code>{styledbits}</code></td><td>{d}</td></tr>"""
.format(styledbits=styledbits, **row))
else:
html += """<tr><td></td><td>{m}</td><td></td><td>{L}</td><td>{C}</td><td>{B}</td>
<td></td><td></td></tr>""".format(**row)
html += '</table></div>'
return HTML(html)

bkm_html([1,0,1,0,0])
$n$$m$$n-m$$L$ $C^{n}(x)$$C^m(x)$bits$d$
0-110$1$$1$ 11
1011$x+1$$1$ 101
2021$1$$1$ 1011
3212$x^{2}+1$$1$ 10100
4222$x^{2}+1$$1$ 101001
43$1$$x^{2}+1$

The bits column here contains a reversed string of the $n+1$ bits processed at step $n$; the bits corresponding to 1 coefficients of the polynomial $C^n(x)$ are underlined. (If you have a browser that doesn’t suck, they will be underlined in red, otherwise they’ll be underlined in black.) The discrepancy $d$ is the parity of the underlined bits, in other words, $d=1$ if the number of underlined 1 bits is odd, and $d=0$ if the number of underlined 1 bits is even.

Let’s take a longer example:

bkm_html([1,1,1,0,1,0,0,0,1,0,1,0,0,1,1,0,0,0,1,1,1,0,1,1,0,0],css="""
table {font-size: 88%; }""")
$n$$m$$n-m$$L$ $C^{n}(x)$$C^m(x)$bits$d$
0-110$1$$1$ 11
1011$x+1$$1$ 110
2021$x+1$$1$ 1110
3031$x+1$$1$ 11101
4313$x^{3}+x+1$$x+1$ 111010
5323$x^{3}+x+1$$x+1$ 1110100
6333$x^{3}+x+1$$x+1$ 11101000
7343$x^{3}+x+1$$x+1$ 111010001
8715$x^{5}+x^{4}+x^{3}+x+1$$x^{3}+x+1$ 1110100010
9725$x^{5}+x^{4}+x^{3}+x+1$$x^{3}+x+1$ 11101000100
10735$x^{5}+x^{4}+x^{3}+x+1$$x^{3}+x+1$ 111010001011
111016$x^{6}+x^{5}+x+1$$x^{5}+x^{4}+x^{3}+x+1$ 1110100010101
121026$x^{4}+x^{2}+1$$x^{5}+x^{4}+x^{3}+x+1$ 11101000101000
131036$x^{4}+x^{2}+1$$x^{5}+x^{4}+x^{3}+x+1$ 111010001010011
141318$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 1110100010100110
151328$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 11101000101001100
161338$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 111010001010011000
171348$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 1110100010100110000
181358$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 11101000101001100010
191368$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 111010001010011000110
201378$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 1110100010100110001110
211388$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 11101000101001100011100
221398$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 111010001010011000111010
2313108$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 1110100010100110001110110
2413118$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 11101000101001100011101100
2513128$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$ 111010001010011000111011000
138$x^{8}+x^{7}+x^{6}+x^{3}+x^{2}+1$$x^{4}+x^{2}+1$
print singalong(GF2QuotientRing(0b101100111),60,initial_state=205)

111010001010011000111011000000010111010110011100010011111110...


## Execution Time as a Function of Bit Length

The Berlekamp-Massey algorithm has one loop; each time the loop executes, we have to compute parity of $L_n$ bits of state, so the execution time is on the order of $\sum\limits_n L_n$ which is $O(n^2)$ or less since $L_n \leq n$. I’m not sure I can prove it, but I suspect its running time is $O(n^2)$ worst-case with $O(n)$ on average.

## Berlekamp-Massey in libgf2

The libgf2 module, which no one actually uses, contains an implementation of the Berlekamp-Massey algorithm, which returns the reversed connection polynomial so it can be used on Galois-style LFSRs. The unreversed connection polynomial is compatible with Fibonacci-style LFSRs.

from libgf2.util import berlekamp_massey

p, L = berlekamp_massey([1,1,1,0,1,0,0,0,1,0,1,0,0,1,1,0,0,0], 30)
print 'polynomial coeffs = 0b{0:b}, L={1:d}'.format(p,L)
polynomial coeffs = 0b101100111, L=8


There’s really not much to it!

## Wrapup

• We looked at the Berlekamp-Massey algorithm for computing a polynomial of minimum degree that is the characteristic polynomial of an LFSR that can produce a given sequence of output bits. The algorithm updates a “connection polynomial” $C^n(x)$ and a shift register length $L_n$ upon receiving $n$ input bit samples. The polynomial $C^n(x)$ describes a Fibonacci LFSR and should be reversed (zero-padded to $L_n$ bits before reversal, if necessary) for use as a Galois LFSR.
• The polynomial $C^n(x)$ determined by the Berlekamp-Massey algorithm is unique as long as the number of bit samples $n$ used to determine the polynomial satisfies $n \geq 2L_n$. Otherwise the polynomial is not unique.

Next time we’ll take a break from the heavy math and look at an example LFSR implementation on a 16-bit dsPIC.

## References

James Massey, Shift-Register Synthesis and BCH Decoding. IEEE Transactions on Information Theory, vol. 15, no. 1, pp. 122-127, January 1969. DOI 10.1109/TIT.1969.1054260, cached copies at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.80.9932

Elwyn Berlekamp, Non-Binary BCH Decoding. Institute of Statistics Mimeo Series 502, North Carolina State University Department of Statistics, December 1966.

Erin Casey, Berlekamp-Massey Algorithm. Student paper, University of Minnesota Research Experience for Undergraduates, 2000.