EmbeddedRelated.com
Blogs
The 2024 Embedded Online Conference

Linear Feedback Shift Registers for the Uninitiated, Part VIII: Matrix Methods and State Recovery

Jason SachsNovember 21, 20174 comments

Last time we looked at a dsPIC implementation of LFSR updates. Now we’re going to go back to basics and look at some matrix methods, which is the third approach to represent LFSRs that I mentioned in Part I. And we’re going to explore the problem of converting from LFSR output to LFSR state.

Matrices: Beloved Historical Dregs

Elwyn Berlekamp’s 1966 paper Non-Binary BCH Encoding covers some work on error-correcting BCH codes, which James Massey recognized had applications to the LFSR, and as a result we have the Berlekamp-Massey algorithm, which we talked about last time. I’ve scanned through Berlekamp’s paper, and it’s just an elegant mess of Greek letters to me. But he does have a section, wryly named Beloved Historical Dregs, that covers a matrix equivalent to polynomial-based analysis (aka generating functions), mentioning the following in passing:

These equations are typically solved by means of a tedious Guass-Jordan [sic] reduction of the above matrix. This method requires many more computations than the generating function approach. The matrix method also requires more storage space. Furthermore, it is less elegant and harder to remember. In view of the iterative algorithm theorems, there is no longer any need to introduce these matrices at all. In short, the matrix method is now obsolete, and not used by those who think young.

Despite this warning, let’s proceed, revisiting our old friend, the LFSR that can be represented by the quotient ring \( H_{25} = GF(2)[x]/(x^5 + x^2 + 1) \):

The polynomial approach we’ve been using in the first six articles in this series, and in the libgf2 module that no one else actually uses, is based around the idea of working in quotient rings; successive states of the LFSR can be computed by multiplying by \( x \) at each step and then reducing to find the remainder modulo the characteristic polynomial \( p(x) = x^5 + x^2 + 1 \). Equivalently — and this is the bitwise boolean algebra method — we shift left by one place and if the leftmost bit is 1, we XOR with the bit vector 100101 that represents the coefficients of the characteristic polynomial.

from IPython.core.display import display, HTML
from libgf2.gf2 import GF2QuotientRing

def polylatex(coeffbits, reverse=False):
    coeffs = []
    y = coeffbits
    n = 0
    while y != 0:
        coeffs.append(y & 1)
        y >>= 1
        n += 1
    n -= 1

    return ('\(' 
        + '+'.join('x^{%d}' % (n-j) if (n-j) > 1 else 'x' if (n-j) == 1 else '1'
                   for j,c in enumerate(coeffs if reverse else coeffs[::-1])
                   if c != 0)
        + '\)')

def lfsr_table(field, nrows):
    html = """<div><style type='text/css' scoped='scoped'>
    </style><table><th>\(x^k\)</th><th>\(x^k \\bmod p(x)\)</th><th>bit vector</th>"""
    y = field.wrap(1)
    n = field.degree
    for k in xrange(nrows):
        html += ("""<tr><td>\(x^{{{k}}}\)</td><td>{ystr}</td><td><code>{bits:0{n}b}</code></td></tr>"""
                     .format(k=k, ystr=polylatex(y.coeffs), bits=y.coeffs, n=n))
        y <<= 1
    html += '</table></div>'
    return HTML(html)

lfsr_table(GF2QuotientRing(0x25), 15)
\(x^k\)\(x^k \bmod p(x)\)bit vector
\(x^{0}\)\(1\)00001
\(x^{1}\)\(x\)00010
\(x^{2}\)\(x^{2}\)00100
\(x^{3}\)\(x^{3}\)01000
\(x^{4}\)\(x^{4}\)10000
\(x^{5}\)\(x^{2}+1\)00101
\(x^{6}\)\(x^{3}+x\)01010
\(x^{7}\)\(x^{4}+x^{2}\)10100
\(x^{8}\)\(x^{3}+x^{2}+1\)01101
\(x^{9}\)\(x^{4}+x^{3}+x\)11010
\(x^{10}\)\(x^{4}+1\)10001
\(x^{11}\)\(x^{2}+x+1\)00111
\(x^{12}\)\(x^{3}+x^{2}+x\)01110
\(x^{13}\)\(x^{4}+x^{3}+x^{2}\)11100
\(x^{14}\)\(x^{4}+x^{3}+x^{2}+1\)11101

Very simple. Now, the third approach for representing the state of the LFSR is to use vectors of bits along with the companion matrix of the characteristic polynomial, and compute in \( GF[2] \), disregarding all this polynomial stuff. The companion matrix of \( p(x) = x^5 + x^2 + 1 \) is

$$C = \begin{bmatrix} 0 & 0 & 0 & 0 & 1 \cr 1 & 0 & 0 & 0 & 0 \cr 0 & 1 & 0 & 0 & 1 \cr 0 & 0 & 1 & 0 & 0 \cr 0 & 0 & 0 & 1 & 0 \cr \end{bmatrix}$$

which is a square matrix of \( n \) rows and columns, where \( n \) is the degree of the characteristic polynomial, and the entries are determined as follows:

  • the subdiagonal elements are all ones; these coefficients act to shift a bit vector by one place when left-multiplying by the matrix \( C \)
  • the rightmost column contains the coefficients of the characteristic polynomial up to degree \( n-1 \)
  • all other entries are zero

The idea is that this matrix \( C \) transforms any bit vector \( S[k] \) representing the state of the LFSR into the state \( S[k+1] \) of the LFSR at the next timestep; in other words, \( S[k+1] = CS[k] \), or if we use the coefficients \( c_j \) of the characteristic polynomial acting on each state bit \( S_j[k] \), then this matrix equation is equivalent to

$$S_j[k+1] = \begin{cases} c_jS_{n-1}[k] & j = 0 \cr c_jS_{n-1}[k] + S_{j-1}[k] & j > 0 \end{cases}$$

In the case of \( H_{25} \) this is

$$\begin{align} S_0[k+1] &= S_4[k] \cr S_1[k+1] &= S_0[k] \cr S_2[k+1] &= S_1[k] + S_4[k] \cr S_3[k+1] &= S_2[k] \cr S_4[k+1] &= S_3[k] \end{align}$$

or, in matrix form:

$$\begin{bmatrix}S_0[k+1] \cr S_1[k+1] \cr S_2[k+1] \cr S_3[k+1] \cr S_4[k+1] \end{bmatrix}= \begin{bmatrix} 0 & 0 & 0 & 0 & 1 \cr 1 & 0 & 0 & 0 & 0 \cr 0 & 1 & 0 & 0 & 1 \cr 0 & 0 & 1 & 0 & 0 \cr 0 & 0 & 0 & 1 & 0 \cr \end{bmatrix} \begin{bmatrix}S_0[k] \cr S_1[k] \cr S_2[k] \cr S_3[k] \cr S_4[k] \end{bmatrix}$$

Here the companion matrix and state vectors are oriented so that the first row and column represent the rightmost shift cell of an LFSR that shifts from right to left, and the last row and column represent the leftmost shift cell.

This approach treats the LFSR state \( S[k] \) as a bit vector representation of the finite field element \( x^k \), where the companion matrix \( C \) can be used to compute the next state \( S[k+1] \). We can also represent the field element \( x^k \) as the matrix \( C^k \), which raises \( C \) to the \( k \)th power (again, with computation in \( GF(2) \), so that all the matrix elements are always 0 or 1); the equivalent bit vector is the contents of the leftmost column:

import numpy as np

def matrixlatex(M):
    latex = '\\begin{bmatrix}'
    for k,row in enumerate(M):
        if k > 0:
            latex += '\\cr '
        latex += '&'.join('%d' % c for c in row)
    latex += '\\end{bmatrix}'
    return latex

def companion_matrix_of_field(field):
    n = field.degree

    # Construct companion matrix
    C = np.zeros((n,n), dtype=np.uint8)
    k = np.arange(n-1)
    C[k+1,k] = 1          # subdiagonal
    C[:,n-1] = [(field.coeffs >> k) & 1 for k in xrange(n)]
    return C
    
def lfsr_matrix_table(field, nrows):
    html = """<div><style type='text/css' scoped='scoped'>
    </style><table><th>\(x^k\)</th><th>\(x^k \\bmod p(x)\)</th><th>bit vector</th><th>matrix representation \(C^k\)</th>"""
    y = field.wrap(1)
    n = field.degree

    C = companion_matrix_of_field(field)

    Ck = np.eye(n, dtype=np.uint8)
    for k in xrange(nrows):
        html += ("""<tr><td>\(x^{{{k}}}\)</td><td>{ystr}</td><td><code>{bits:0{n}b}</code></td><td>{matrix}</td></tr>"""
                     .format(k=k, ystr=polylatex(y.coeffs), bits=y.coeffs, n=n, matrix=matrixlatex(Ck)))
        y <<= 1
        Ck = np.dot(Ck,C) & 1
    html += '</table></div>'
    return HTML(html)    

H25 = GF2QuotientRing(0x25)
lfsr_matrix_table(H25, 15)
\(x^k\)\(x^k \bmod p(x)\)bit vectormatrix representation \(C^k\)
\(x^{0}\)\(1\)00001\begin{bmatrix}1&0&0&0&0\cr 0&1&0&0&0\cr 0&0&1&0&0\cr 0&0&0&1&0\cr 0&0&0&0&1\end{bmatrix}
\(x^{1}\)\(x\)00010\begin{bmatrix}0&0&0&0&1\cr 1&0&0&0&0\cr 0&1&0&0&1\cr 0&0&1&0&0\cr 0&0&0&1&0\end{bmatrix}
\(x^{2}\)\(x^{2}\)00100\begin{bmatrix}0&0&0&1&0\cr 0&0&0&0&1\cr 1&0&0&1&0\cr 0&1&0&0&1\cr 0&0&1&0&0\end{bmatrix}
\(x^{3}\)\(x^{3}\)01000\begin{bmatrix}0&0&1&0&0\cr 0&0&0&1&0\cr 0&0&1&0&1\cr 1&0&0&1&0\cr 0&1&0&0&1\end{bmatrix}
\(x^{4}\)\(x^{4}\)10000\begin{bmatrix}0&1&0&0&1\cr 0&0&1&0&0\cr 0&1&0&1&1\cr 0&0&1&0&1\cr 1&0&0&1&0\end{bmatrix}
\(x^{5}\)\(x^{2}+1\)00101\begin{bmatrix}1&0&0&1&0\cr 0&1&0&0&1\cr 1&0&1&1&0\cr 0&1&0&1&1\cr 0&0&1&0&1\end{bmatrix}
\(x^{6}\)\(x^{3}+x\)01010\begin{bmatrix}0&0&1&0&1\cr 1&0&0&1&0\cr 0&1&1&0&0\cr 1&0&1&1&0\cr 0&1&0&1&1\end{bmatrix}
\(x^{7}\)\(x^{4}+x^{2}\)10100\begin{bmatrix}0&1&0&1&1\cr 0&0&1&0&1\cr 1&1&0&0&1\cr 0&1&1&0&0\cr 1&0&1&1&0\end{bmatrix}
\(x^{8}\)\(x^{3}+x^{2}+1\)01101\begin{bmatrix}1&0&1&1&0\cr 0&1&0&1&1\cr 1&0&0&1&1\cr 1&1&0&0&1\cr 0&1&1&0&0\end{bmatrix}
\(x^{9}\)\(x^{4}+x^{3}+x\)11010\begin{bmatrix}0&1&1&0&0\cr 1&0&1&1&0\cr 0&0&1&1&1\cr 1&0&0&1&1\cr 1&1&0&0&1\end{bmatrix}
\(x^{10}\)\(x^{4}+1\)10001\begin{bmatrix}1&1&0&0&1\cr 0&1&1&0&0\cr 0&1&1&1&1\cr 0&0&1&1&1\cr 1&0&0&1&1\end{bmatrix}
\(x^{11}\)\(x^{2}+x+1\)00111\begin{bmatrix}1&0&0&1&1\cr 1&1&0&0&1\cr 1&1&1&1&1\cr 0&1&1&1&1\cr 0&0&1&1&1\end{bmatrix}
\(x^{12}\)\(x^{3}+x^{2}+x\)01110\begin{bmatrix}0&0&1&1&1\cr 1&0&0&1&1\cr 1&1&1&1&0\cr 1&1&1&1&1\cr 0&1&1&1&1\end{bmatrix}
\(x^{13}\)\(x^{4}+x^{3}+x^{2}\)11100\begin{bmatrix}0&1&1&1&1\cr 0&0&1&1&1\cr 1&1&1&0&0\cr 1&1&1&1&0\cr 1&1&1&1&1\end{bmatrix}
\(x^{14}\)\(x^{4}+x^{3}+x^{2}+1\)11101\begin{bmatrix}1&1&1&1&1\cr 0&1&1&1&1\cr 1&1&0&0&0\cr 1&1&1&0&0\cr 1&1&1&1&0\end{bmatrix}

As I quoted earlier, Berlekamp mentioned the matrix methods are slower and take up more storage space than methods using polynomials. Specifically:

  • Since an \( n \times n \) matrix multiply takes \( O(n^3) \) multiplications (or \( \approx O(n^{2.8074}) \) using Strassen multiplication), it’s slower than the polynomial bit vector method, which runs in \( O(n^2) \) operations for large \( n \)
  • The storage space requirements are \( O(n^2) \), vs. \( O(n) \) for bit vector polynomials

So why bother? Why don’t we “think young” and use polynomials and finite fields everywhere, instead of messing about with matrices?

Uses for Matrix Representation in LFSR Analysis

There are still a few reasons to resort to matrix methods.

One is understanding; if you get hung up on the finite field representation, but the matrix method makes sense to you, then by all means please use it.

But the main reason involves the manipulation of LFSR output bitstreams. Working with LFSR state is fairly simple: it can be thought of as the element \( x^k \) in a finite field. But then if we want to describe the LFSR output, it’s not so easy; it involves taking the coefficient of \( x^{n-1} \), which reduces an \( n \)-bit state to a 1-bit output, and how do you do that? The generating function approach is one option; it’s essentially the same thing as z-transforms in digital signal processing, where we represent the infinitely repeating output stream \( y[k] \) as a power series equivalent \( \sum\limits_k y[k]x^k = \frac{1}{p(x)} \), but somehow whenever I try to use this, I get stuck — where do you plug in the state \( S[k] \)? There’s also something called a trace map which is part of the whole finite field rigamarole, but it makes my head ache to try to understand it. Whereas with matrix methods, it’s easy; if we have a state vector \( S = \begin{bmatrix}S_0 & S_1 & S_2 & \ldots & S_{n-2} & S_{n-1}\end{bmatrix}^T \) then we just express the output as \( y = W_0 S \) where \( W_0 = \begin{bmatrix}0 & 0 & 0 & \ldots & 0 & 1\end{bmatrix} \) and this pulls out the coefficient of \( x^{n-1} \). We’ll use this method in two applications.

Time Shifts

Now here’s an interesting proposition: suppose we know the state \( S[k] \) of the LFSR at step \( k \) and we want to know the output \( y[k+d] \) where \( d \) is a fixed offset known ahead of time. There are two ways to compute this.

The finite field approach tells us we can compute \( x^dS \) and take the coefficient of \( x^{n-1} \). For example, in \( H_{25} \) if we start with \( x^5 = x^2 + 1 \) and we know \( d=4 \) then we multiply by \( x^4 \) to get \( x^4(x^2 + 1) = x^6 + x^4 = x^4 + x^3 + x \), and then pull out the coefficient of \( x^4 \) to get 1. Finite field multiplication takes \( O(n^2) \) steps for large \( n \).

The matrix approach lets us do something different. The output advanced by \( d \) steps can be expressed as \( y[k+d] = W_0S[k+d] = W_0C^dS[k] \); if we take the companion matrix \( C \) and raise it to the \( d \)th power and left-multiply by the vector \( W_0 = \begin{bmatrix}0 & 0 & 0 & \ldots & 0 & 1\end{bmatrix} \), then we have a vector \( W_d \) that we can use to extract the output \( y[k+d] \) in only \( O(n) \) steps. It is also worth noting that because of the structure of \( W_0 \), the vector \( W_d = W_0C^d \) is the bottom row of \( C^d \). In the example given above with \( H_{25} \) and \( d=4 \), then the bottom row of \( C^4 \) is \( \begin{bmatrix}1 & 0 & 0 & 1 & 0\end{bmatrix} \), so for any state vector we can just take \( S_0 + S_3 \) to find the output four steps in advance.

Just to double-check here, let’s look at the output stream:

000010010110011

Now let’s look at \( S_0 + S_3 \) for each step:

  • \( S[0] = \) 00001, \( 1+0 = 1 \)
  • \( S[1] = \) 00010, \( 0+0 = 0 \)
  • \( S[2] = \) 00100, \( 0+0 = 0 \)
  • \( S[3] = \) 01000, \( 0+1 = 1 \)
  • \( S[4] = \) 10000, \( 0+0 = 0 \)
  • \( S[5] = \) 00101, \( 1+0 = 1 \)
  • \( S[6] = \) 01010, \( 0+1 = 1 \)
  • \( S[7] = \) 10100, \( 0+0 = 0 \)
  • \( S[8] = \) 01101, \( 1+1 = 0 \)
  • \( S[9] = \) 11010, \( 0+1 = 1 \)
  • \( S[10] = \) 10001, \( 1+0 = 1 \)
  • \( S[11] = \) 00111, \( 1+0 = 1 \)
  • \( S[12] = \) 01110, \( 0+1 = 1 \)
  • \( S[13] = \) 11100, \( 0+1 = 1 \)
  • \( S[14] = \) 11101, \( 1+1 = 0 \)

so the output stream advanced by 4 steps should be

100101100111110

Pretty neat, huh? This method works for any value of \( d \), and it is easy to see that no matter what \( d \) is, we can express a shifted version \( y[k+d] \) from the state \( S[k] \) by XORing an appropriate subset of bits of the state \( S[k] \) as designated by the vector \( W_d = W_0C^d \). (Again, \( W_d \) is just the bottom row of the matrix \( C^d \). Essentially we use \( W_d \) as a bitmask for the state \( S[k] \), and then find the parity of the result.)

It turns out that we can actually figure out \( W_d \) as a realizable computation without using any matrix math. That’s right, I just told you a while back that matrix methods were useful in some cases, and now I’m saying we can get along without them after all. The astute reader may notice that the coefficients of the \( j \)th column (for any \( j \) numbered \( 0, 1, 2, \ldots n-1 \)) of the matrix \( C^d \) are also the coefficients of \( x^{d+j} \bmod p(x) \). For example, consider \( x^8 = x^3 + x^2 + 1 \pmod {x^5 + x^2 + 1} \), which, expressed in bit vector form is 01101, and if we express it as a column vector from least-significant to most-significant bit, is \( \begin{bmatrix}1 & 0 & 1 & 1 & 0\end{bmatrix}^T \), which is column 0 of \( C^8 \), column 1 of \( C^7 \), column 2 of \( C^6 \), column 3 of \( C^5 \), and column 4 of \( C^4 \). Why do we care? Because \( W_d \) is the bottom row of \( C^d \), so if we get the bottom component of each column of \( C^d \) then that gives us \( W_d \).

So here’s an equivalent calculation using finite fields only:

  • Precomputation:
    • Compute \( x^d \)
    • Compute \( W_{d} \) on a component-by-component basis: for \( j = 0, 1, 2, \ldots n-1 \), \( W_{d,j} = \) the bottom component of the \( j \)th column of \( C^d \) which is also the coefficient of \( x^{n-1} \) of \( x^{d+j} \bmod p(x) \).
  • Real-time computation:
    • The output \( y[k+d] \) can therefore be expressed as a linear combination of the state bits \( S[k] \):

$$y[k+d] = W_dS[k] = \sum\limits_{j=0}^{n-1} W_{d,j}S_j[k]$$

There’s probably some way to express this identity in terms of trace maps or some other aspect of finite fields, but we were able to do some analysis in the matrix representation fairly easily, and then port it back over to the polynomial bit vector representation. In other words, the matrix math was important for us to understand analytically what was going on, but when it came time to compute something tangible, we were able to find a better method using quotient ring elements and bit vectors.

Let’s actually do this computation in Python:

from libgf2.util import parity

def time_shifter(field, d):
    """ Precomputes coefficients to calculate the output of an LFSR
    d steps ahead of the known state S"""
    
    n = field.degree
    Wd = 0
    xdj = field.wrap(1) << d
    for j in xrange(n):
        b = (xdj.coeffs >> n-1)
        # extract the coeff of x^(n-1) from x^(d+j)
        Wd ^= b << j
        xdj <<= 1
    return Wd       
H25 = GF2QuotientRing(0x25)
Wd = time_shifter(H25, 4)
'{0:05b}'.format(Wd)
'01001'
def gather(state, n):
    """ Returns the n bits y[k], y[k+1], ... y[k+n-1]
    given state S[k]
    """
    maskpos = state.field.degree - 1
    result = []
    for _ in xrange(n):
        result.append(state.coeffs >> maskpos)
        state <<= 1
    return result

def bits_to_text(bits):
    return ''.join('1' if b else '0' for b in bits)
            
def gather_from_lookahead(state, n, d):
    """ We're going to return the n bits y[k+d], y[k+d+1], ... y[k+d+n-1]
    but do this based on the state S[k], S[k+1], S[k+2], ... S[n-1]
    by using lookahead coefficients Yd as a mask and then computing parity.
    """
    Wd = time_shifter(state.field, d)
    result = []
    for _ in xrange(n):
        result.append(parity(state.coeffs & Wd))
        state <<= 1
    return result
for d in xrange(15):
    print "d=%2d: %s\n      %s" % (
            d,
            bits_to_text(gather(H25.wrap(1) << d, 15)),
            bits_to_text(gather_from_lookahead(H25.wrap(1), 15, d)))
        
d= 0: 000010010110011
      000010010110011
d= 1: 000100101100111
      000100101100111
d= 2: 001001011001111
      001001011001111
d= 3: 010010110011111
      010010110011111
d= 4: 100101100111110
      100101100111110
d= 5: 001011001111100
      001011001111100
d= 6: 010110011111000
      010110011111000
d= 7: 101100111110001
      101100111110001
d= 8: 011001111100011
      011001111100011
d= 9: 110011111000110
      110011111000110
d=10: 100111110001101
      100111110001101
d=11: 001111100011011
      001111100011011
d=12: 011111000110111
      011111000110111
d=13: 111110001101110
      111110001101110
d=14: 111100011011101
      111100011011101

For each value of \( d \) we computed the output bits \( y[k+d] \) in two ways:

  • first, by just computing \( x^{k+d} \) in the finite field and extracting the coefficient of \( x^{n-1} \)
  • second, by using the lookahead approach by precomputing \( W_d \) and applying it to the state vector \( S[k] \).

State Recovery: Going from Output Bits to State

Next we’ll tackle the problem of state recovery, namely, suppose we have a known \( n \)-bit LFSR with characteristic polynomial \( p(x) \); we don’t know the LFSR state but we can observe \( n \) consecutive bits of output \( y[k], y[k+1], y[k+2], \ldots, y[k+n-1] \), and we don’t know \( k \). Can we determine the LFSR state \( S[k] \) from this information?

The answer is yes, and the easy brute-force way of getting there involves matrices.

Let’s imagine for a moment that we did know the state \( S[k] \). The output bit \( y[k] = W_0S[k] \). We know from the previous section that the output bit \( y[k+1] = W_1S[k] = W_0CS[k] \), and \( y[k+2] = W_2S[k] = W_0C^2S[k] \), and so on. So we can write an equation for the vector of \( n \) consecutive output bits:

$$ Y[k] = \begin{bmatrix}y[k] \cr y[k+1] \cr y[k+2] \cr \vdots \cr y[k+n-1]\end{bmatrix} = W S[k] $$

where \( W \) is an \( n\times n \) matrix defined by

$$ W = \begin{bmatrix}W_0 \cr W_0C \cr W_0C^2 \cr \vdots \cr W_0C^{n-1}\end{bmatrix}$$

and note again that \( W_0C^d \) is the bottom row of the matrix \( C^d \).

Well, it’s a straightforward matrix problem to solve \( Y[k]=WS[k] \) for \( S \). Conceptually we can think of it as \( S[k] = W^{-1}Y[k] \), although the usual mantra in numerical matrix computation is never actually compute the inverse; the general methods for solving this matrix equation are to factor \( W = LU \) and then use back-substitution to solve \( Y[k] = LV \) for \( V \), then solve from \( US[k]=V \) for \( S[k] \). In the world of continuous-valued matrices we often use the \( QR \) factorization rather than \( LU \) factorization for numerical stability, but here in the world of \( GF(2) \) there’s no numerical accuracy issue, so \( LU \) factoring is fine. That’s the brute-force method, at least (requiring a permutation matrix \( PA=LU \) if necessary). But if we try an example of this, we’ll see there’s an easier way.

Let’s take our field \( H_{25} \) again. The matrix \( W \) is

$$ W = \begin{bmatrix} 0 & 0 & 0 & 0 & 1 \cr 0 & 0 & 0 & 1 & 0 \cr 0 & 0 & 1 & 0 & 0 \cr 0 & 1 & 0 & 0 & 1 \cr 1 & 0 & 0 & 1 & 0 \end{bmatrix} $$

This is interesting because it’s a Hankel matrix with upper part 0, which means we could flip it horizontally to get lower triangular, or vertically to get upper triangular.

Let’s solve instead for \( Y[k]=\bar{W}\bar{S}[k] \) where \( \bar{S}[k] \) is the reflected state vector (coefficient of \( x^{n-1} \) in the top element, rather than the coefficient of \( x^0 \)):

$$ S[k] = \begin{bmatrix}S_{n-1}[k] \cr S_{n-2}[k] \cr S_{n-3}[k] \cr \vdots \cr S_0[k]\end{bmatrix}$$

This flips the W matrix horizontally to become a lower triangular Toeplitz matrix, since its columns act upon the state vector’s rows, so

$$ \bar{W} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \cr 0 & 1 & 0 & 0 & 0 \cr 0 & 0 & 1 & 0 & 0 \cr 1 & 0 & 0 & 1 & 0 \cr 0 & 1 & 0 & 0 & 1 \end{bmatrix} $$

and since \( \bar{W} \) is lower triangular, we can just solve \( Y[k] = \bar{W}\bar{S}[k] \) by back-substitution, e.g.:

$$\begin{align} S_{n-1}[k] &= Y[k] \cr S_{n-2}[k] &= Y[k+1] \cr S_{n-3}[k] &= Y[k+2] \cr S_{n-4}[k] &= Y[k+3] + S_{n-1}[k] \cr S_{n-5}[k] &= Y[k+4] + S_{n-2}[k] \end{align}$$

Since the \( \bar{W} \) matrix is a lower triangular Toeplitz matrix, the first column completely determines the remaining content and, the \( j \)th row of the first column is the coefficient of \( x^{n-1} \) of \( x^{n+j-1} \bmod p(x) \).

import scipy.linalg

def compute_Wbar(field, first_column_only=False):
    top_bitpos = (field.degree-1)
    y = field.wrap(1 << top_bitpos)
    col0 = []
    for _ in xrange(field.degree):
        col0.append(y.coeffs >> top_bitpos)
        y <<= 1
    if first_column_only:
        return col0
    else:
        return scipy.linalg.toeplitz(col0, [1]+[0]*top_bitpos)

Wbar=compute_Wbar(H25)
Wbar
array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 1, 0],
       [0, 1, 0, 0, 1]])
def state_recovery1(output, field=None, Wbar=None):
    if Wbar is None:
        Wbar = compute_Wbar(field)
    # Now solve Y = Wbar * Sbar for Sbar, where Y is the output bits
    S = (scipy.linalg.solve_triangular(a=Wbar,b=output,lower=True) % 2).astype(np.int)
    S = np.flipud(S)
    return S

And to check whether it works, we can just plug things in:

Ylist = np.array([int(c) for c in '0000100101100111110001'])
Ylist
array([0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1])
e = H25.wrap(1)
def state_recovery_test(Ylist, field, solver, **kwargs):
    n = field.degree
    for k in xrange(len(Ylist)-n+1):
        Y = Ylist[k:k+n]
        S = solver(Y, **kwargs)
        Sbits = sum(b<<k for k,b in enumerate(S))
        print '{0} from state {1} = {2:0{n}b} ({3:0{n}b} expected)'.format(
            Y,S,Sbits,(e<<k).coeffs, n=n)
        
state_recovery_test(Ylist, e.field, state_recovery1, Wbar=Wbar)
[0 0 0 0 1] from state [1 0 0 0 0] = 00001 (00001 expected)
[0 0 0 1 0] from state [0 1 0 0 0] = 00010 (00010 expected)
[0 0 1 0 0] from state [0 0 1 0 0] = 00100 (00100 expected)
[0 1 0 0 1] from state [0 0 0 1 0] = 01000 (01000 expected)
[1 0 0 1 0] from state [0 0 0 0 1] = 10000 (10000 expected)
[0 0 1 0 1] from state [1 0 1 0 0] = 00101 (00101 expected)
[0 1 0 1 1] from state [0 1 0 1 0] = 01010 (01010 expected)
[1 0 1 1 0] from state [0 0 1 0 1] = 10100 (10100 expected)
[0 1 1 0 0] from state [1 0 1 1 0] = 01101 (01101 expected)
[1 1 0 0 1] from state [0 1 0 1 1] = 11010 (11010 expected)
[1 0 0 1 1] from state [1 0 0 0 1] = 10001 (10001 expected)
[0 0 1 1 1] from state [1 1 1 0 0] = 00111 (00111 expected)
[0 1 1 1 1] from state [0 1 1 1 0] = 01110 (01110 expected)
[1 1 1 1 1] from state [0 0 1 1 1] = 11100 (11100 expected)
[1 1 1 1 0] from state [1 0 1 1 1] = 11101 (11101 expected)
[1 1 1 0 0] from state [1 1 1 1 1] = 11111 (11111 expected)
[1 1 0 0 0] from state [1 1 0 1 1] = 11011 (11011 expected)
[1 0 0 0 1] from state [1 1 0 0 1] = 10011 (10011 expected)

Tada! The output is on the left, the corresponding LFSR state is on the right.

The only thing that doesn’t sit right with me is the dependency on matrices — yes, it’s time for me to assert again that though the matrix approach does help with understanding, in the end, the matrix computations are unnecessary and we can use bitwise computation or finite fields.

The fact that there’s a rich structure in the \( \bar{W} \) matrix means that we can exploit this and just store its first column, which I’ll call \( w \); the coefficients \( w_j \) are the coefficient of \( x^{n-1} \) in \( x^{n+j-1} \bmod p(x) \).

The way we solve back-substitution in lower triangular matrix is for each row \( j \):

$$\bar{S} _ j[k] = Y[k+j] + \sum\limits_ {i=1}^{j}w _ i\bar{S} _ {j-i}[k]$$

In other words

$$\begin{align} S _ {n-1}[k] = \bar{S} _ 0[k] &= Y[k] \cr S _ {n-2}[k] = \bar{S} _ 1[k] &= Y[k+1] + w_ 1\bar{S} _ 0[k]\cr S _ {n-3}[k] = \bar{S} _ 2[k] &= Y[k+2] + w_ 1\bar{S} _ 1[k] + w_ 2\bar{S} _ 0[k]\cr S _ {n-4}[k] = \bar{S} _ 3[k] &= Y[k+3] + w_ 1\bar{S} _ 2[k] + w_ 2\bar{S} _ 1[k] + w_ 3\bar{S} _ 0[k]\cr &\vdots \cr S _ {0}[k] = \bar{S} _ {n-1}[k] &= Y[k+n-1] + w_ 1\bar{S} _ {n-2}[k] + w_ 2\bar{S} _ {n-3}[k] + \ldots + w _ {n-2}\bar{S} _ 1[k]+ w_ {n-1}\bar{S} _ 0[k] \end{align}$$

Let’s code this in Python!

def state_recovery2(output, field=None, w=None):
    if w is None:
        w = compute_Wbar(field, first_column_only=True)
        n = field.degree
    else:
        n = len(w)
        
    S = np.zeros(n, dtype=np.int)
    for j in xrange(n):
        Sbarj = output[j]
        for i in xrange(1,j+1):
            Sbarj ^= w[i]*S[n-1-j+i]
        S[n-1-j] = Sbarj
    return S
e = H25.wrap(1)
w = compute_Wbar(e.field, first_column_only=True)
state_recovery_test(Ylist, e.field, state_recovery2, w=w)
[0 0 0 0 1] from state [1 0 0 0 0] = 00001 (00001 expected)
[0 0 0 1 0] from state [0 1 0 0 0] = 00010 (00010 expected)
[0 0 1 0 0] from state [0 0 1 0 0] = 00100 (00100 expected)
[0 1 0 0 1] from state [0 0 0 1 0] = 01000 (01000 expected)
[1 0 0 1 0] from state [0 0 0 0 1] = 10000 (10000 expected)
[0 0 1 0 1] from state [1 0 1 0 0] = 00101 (00101 expected)
[0 1 0 1 1] from state [0 1 0 1 0] = 01010 (01010 expected)
[1 0 1 1 0] from state [0 0 1 0 1] = 10100 (10100 expected)
[0 1 1 0 0] from state [1 0 1 1 0] = 01101 (01101 expected)
[1 1 0 0 1] from state [0 1 0 1 1] = 11010 (11010 expected)
[1 0 0 1 1] from state [1 0 0 0 1] = 10001 (10001 expected)
[0 0 1 1 1] from state [1 1 1 0 0] = 00111 (00111 expected)
[0 1 1 1 1] from state [0 1 1 1 0] = 01110 (01110 expected)
[1 1 1 1 1] from state [0 0 1 1 1] = 11100 (11100 expected)
[1 1 1 1 0] from state [1 0 1 1 1] = 11101 (11101 expected)
[1 1 1 0 0] from state [1 1 1 1 1] = 11111 (11111 expected)
[1 1 0 0 0] from state [1 1 0 1 1] = 11011 (11011 expected)
[1 0 0 0 1] from state [1 1 0 0 1] = 10011 (10011 expected)

And that’s great, everything is hunky-dory.

But wait — there’s more! Even though we reduced the \( O(n^2) \) storage requirement, the method shown above still feels kinda matrix-y; we’re using a bunch of linear equations and solving them one by one, and the matrix lurks unseen in the background like some kind of shadow cabal pulling the strings.

It turns out we can solve the state recovery problem in two — count ‘em, TWO! — ways just by using our friend the shift register and competely ignoring matrix methods after all. We’ll look at our \( H_{25} \) example (polynomial bit vector = 100101) to show how it works.

State Recovery with LFSR Shifting, Method A

Suppose we have a shift register state \( S[k] \), and we observe the output sequence 11000: \( Y[k] = Y[k+1] = 1 \) and \( Y[k+2] = Y[k+3] = Y[k+4] = 0 \). We’re going to maintain a pair of 5-bit values, one representing state content \( S[k] \) and the other representing the validity \( V[k] \) of the bits, where 1 indicates we know the bit is valid and 0 means it isn’t.

For example if the state bits are 11101 and the validity bits are 10110 (\( V_4 = V_2 = V_1 = 1, V_3 = V_0 = 0 \)) then this means our knowledge of the actual state bits is 1?10? where the ? character denotes that we don’t know the content of that bit, because the corresponding validity bit was 0.

Let’s initialize the state bits \( S[k] \) to all zeros 00000 and the validity bits to all zeros 00000. Now the first output bit \( Y[k] = 1 \), which comes from the most significant bit of state, so that tells us that we can correct the state bits to 10000 and the validity bits to 10000.

Now let’s execute one step of our state update for a normal LFSR representing this field \( H_{25} \), so \( S[k+1] \) can be represented by 00101 state bits (shift left previous state 10000 to 100000 and since the high bit is 1, XOR with the polynomial bit vector 100101 to knock the number of bits back down to 5) and validity bits 00001. Huh? How did we know the validity bits are 00001? Well, the top bit validity \( V_4[k+1] = 0 \) because the state bit \( S_4[k+1] \) depends on the next lower bit \( S_3[k] \) on the previous cycle, and the validity of that bit \( V_3[k] = 0 \). Same for the next bits, and the only one we know for certain is the least significant bit, which depends on \( S_4[k] \) and we know that for certain.

The next step is to look at the next output bit \( Y[k+1] = 1 \), so we can correct bit 4 to a 1 and set the validity bit \( V_4[k+1] = 1 \), so now the state bits \( S[k+1] = \) 10101 and validity bits \( V[k+1] = \) 10001.

Then we execute one step of the LFSR state update again: 1010101111 and the validity bits rotate left to 00011.

We keep doing this, alternating between correcting the most significant state bit from the known output bit, and updating the LFSR state, until we process 5 bits and the validity bits are 11111, at which point we know \( S[k+5] \) completely and we just execute the state update backwards to get back to \( S[k] \):

stepstate bits \(S\)validity bits \(V\)state knowledge
initialization0000000000?????
output \(Y[k] = 1\)10000100001????
state update \(S[k+1] \leftarrow S[k]\)0010100001????1
output \(Y[k+1] = 1\)10101100011???1
state update \(S[k+2] \leftarrow S[k+1]\)0111100011???11
output \(Y[k+2] = 0\)01111100110??11
state update \(S[k+3] \leftarrow S[k+2]\)1111000111??110
output \(Y[k+3] = 0\)01110101110?110
state update \(S[k+4] \leftarrow S[k+3]\)1110001111?1100
output \(Y[k+4] = 0\)011001111101100
*state update \(S[k+5] \leftarrow S[k+4]\)110001111111000
*state update \(S[k+5] \rightarrow S[k+4]\)011001111101100
state update \(S[k+4] \rightarrow S[k+3]\)001101111100110
state update \(S[k+3] \rightarrow S[k+2]\)000111111100011
state update \(S[k+2] \rightarrow S[k+1]\)100111111110011
state update \(S[k+1] \rightarrow S[k]\)110111111111011

Very simple! And since the validity bits always follow this sequence, regardless of the data, we don’t actually have to store them; they’re implied by the algorithm itself, so we just need to keep track of state bits. It also doesn’t matter what state bit pattern we start with; here we show 00000 to start with, but we could have picked 11111 or 10101; we’d get the same answer.

The two steps marked with a * aren’t necessary since they are inverses of each other, and could be optimized out at the cost of a minor increase in algorithm complexity — though I’d personally just keep the simpler algorithm at the cost of two LFSR updates.

Here is the same algorithm working on some more sample output patterns:

def state_recovery3(field, y):
    s = field.wrap(0)
    bitmask = 1 << (field.degree - 1)
    shiftcount = 0
    for b in y:
        # replace the topmost state bit with the corresponding output bit
        s += (b*bitmask) ^ (s.coeffs & bitmask)
        s <<= 1
        shiftcount += 1
    s >>= shiftcount
    return s

for poly in [0x25, 0x1c3]:
    field = GF2QuotientRing(poly)
    n = field.degree
    print "{0:{n1}}output{0:{n2}}state{0:{n3}}check output from state".format("",
                                                                n1=(n*3//2 - 3),
                                                                n2=(n*2 - 4),
                                                                n3=(n//2 - 1))
    for k in xrange(n):
        for b in [0,1]:
            pattern = [b]*k + [1-b] + [b]*(n-1-k)
            s = state_recovery3(field, pattern)
            y = gather(s, n)
            print '{0} {1:0{n}b} {2}'.format(y, s.coeffs, pattern, n=n)
            assert y == pattern
    output      state check output from state
[1, 0, 0, 0, 0] 10010 [1, 0, 0, 0, 0]
[0, 1, 1, 1, 1] 01110 [0, 1, 1, 1, 1]
[0, 1, 0, 0, 0] 01001 [0, 1, 0, 0, 0]
[1, 0, 1, 1, 1] 10101 [1, 0, 1, 1, 1]
[0, 0, 1, 0, 0] 00100 [0, 0, 1, 0, 0]
[1, 1, 0, 1, 1] 11000 [1, 1, 0, 1, 1]
[0, 0, 0, 1, 0] 00010 [0, 0, 0, 1, 0]
[1, 1, 1, 0, 1] 11110 [1, 1, 1, 0, 1]
[0, 0, 0, 0, 1] 00001 [0, 0, 0, 0, 1]
[1, 1, 1, 1, 0] 11101 [1, 1, 1, 1, 0]
         output            state   check output from state
[1, 0, 0, 0, 0, 0, 0, 0] 11100001 [1, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 1, 1, 1, 1, 1, 1] 01011111 [0, 1, 1, 1, 1, 1, 1, 1]
[0, 1, 0, 0, 0, 0, 0, 0] 01110000 [0, 1, 0, 0, 0, 0, 0, 0]
[1, 0, 1, 1, 1, 1, 1, 1] 11001110 [1, 0, 1, 1, 1, 1, 1, 1]
[0, 0, 1, 0, 0, 0, 0, 0] 00111000 [0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 1, 1, 1, 1, 1] 10000110 [1, 1, 0, 1, 1, 1, 1, 1]
[0, 0, 0, 1, 0, 0, 0, 0] 00011100 [0, 0, 0, 1, 0, 0, 0, 0]
[1, 1, 1, 0, 1, 1, 1, 1] 10100010 [1, 1, 1, 0, 1, 1, 1, 1]
[0, 0, 0, 0, 1, 0, 0, 0] 00001110 [0, 0, 0, 0, 1, 0, 0, 0]
[1, 1, 1, 1, 0, 1, 1, 1] 10110000 [1, 1, 1, 1, 0, 1, 1, 1]
[0, 0, 0, 0, 0, 1, 0, 0] 00000111 [0, 0, 0, 0, 0, 1, 0, 0]
[1, 1, 1, 1, 1, 0, 1, 1] 10111001 [1, 1, 1, 1, 1, 0, 1, 1]
[0, 0, 0, 0, 0, 0, 1, 0] 00000011 [0, 0, 0, 0, 0, 0, 1, 0]
[1, 1, 1, 1, 1, 1, 0, 1] 10111101 [1, 1, 1, 1, 1, 1, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 1] 00000001 [0, 0, 0, 0, 0, 0, 0, 1]
[1, 1, 1, 1, 1, 1, 1, 0] 10111111 [1, 1, 1, 1, 1, 1, 1, 0]

Tada!

State Recovery with LFSR Shifting, Method B

This is a very similar approach to method A but going backwards; we start by looking at state \( S[k+5] \) and perform a multiplication by \( x^{-1} \) at each step: shift right, but first, if the least significant bit is 1, XOR the LFSR with the polynomial bit vector of the LFSR characteristic polynomial. This time, at each step, we replace the least significant bit with output \( Y[k+j] \) before shifting right; after the shift-right this bit ends up in the most significant state bit:

stepstate bits \(S\)validity bits \(V\)state knowledge
initialization \(S[k+5]\)0000000000?????
output \(Y[k+4] = 0\)0000000001????0
state update \(S[k+5] \rightarrow S[k+4]\)00000100000????
output \(Y[k+3] = 0\)00000100010???0
state update \(S[k+4] \rightarrow S[k+3]\)110001100000???
output \(Y[k+2] = 0\)000001100100??0
state update \(S[k+3] \rightarrow S[k+2]\)0000011100000??
output \(Y[k+1] = 1\)0000111101000?1
state update \(S[k+2] \rightarrow S[k+1]\)10010111101001?
output \(Y[k] = 1\)100111111110011
state update \(S[k+1] \rightarrow S[k]\)110111111111011

We get the same answer and with fewer steps; we don’t have to shift the state \( S[k] \) after updating the LFSR bits will all the input bits. The only downside is that we have to feed in the output bits in reverse order.

def state_recovery4(field, y):
    s = field.wrap(0)
    for b in y[::-1]:
        # replace the least significant state bit with the corresponding output bit
        s += b ^ (s.coeffs & 1)
        s >>= 1
    return s

for poly in [0x25, 0x1c3]:
    field = GF2QuotientRing(poly)
    n = field.degree
    print "{0:{n1}}output{0:{n2}}state{0:{n3}}check output from state".format("",
                                                                n1=(n*3//2 - 3),
                                                                n2=(n*2 - 4),
                                                                n3=(n//2 - 1))
    for k in xrange(n):
        for b in [0,1]:
            pattern = [b]*k + [1-b] + [b]*(n-1-k)
            s = state_recovery4(field, pattern)
            y = gather(s, n)
            print '{0} {1:0{n}b} {2}'.format(y, s.coeffs, pattern, n=n)
            assert y == pattern
    output      state check output from state
[1, 0, 0, 0, 0] 10010 [1, 0, 0, 0, 0]
[0, 1, 1, 1, 1] 01110 [0, 1, 1, 1, 1]
[0, 1, 0, 0, 0] 01001 [0, 1, 0, 0, 0]
[1, 0, 1, 1, 1] 10101 [1, 0, 1, 1, 1]
[0, 0, 1, 0, 0] 00100 [0, 0, 1, 0, 0]
[1, 1, 0, 1, 1] 11000 [1, 1, 0, 1, 1]
[0, 0, 0, 1, 0] 00010 [0, 0, 0, 1, 0]
[1, 1, 1, 0, 1] 11110 [1, 1, 1, 0, 1]
[0, 0, 0, 0, 1] 00001 [0, 0, 0, 0, 1]
[1, 1, 1, 1, 0] 11101 [1, 1, 1, 1, 0]
         output            state   check output from state
[1, 0, 0, 0, 0, 0, 0, 0] 11100001 [1, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 1, 1, 1, 1, 1, 1] 01011111 [0, 1, 1, 1, 1, 1, 1, 1]
[0, 1, 0, 0, 0, 0, 0, 0] 01110000 [0, 1, 0, 0, 0, 0, 0, 0]
[1, 0, 1, 1, 1, 1, 1, 1] 11001110 [1, 0, 1, 1, 1, 1, 1, 1]
[0, 0, 1, 0, 0, 0, 0, 0] 00111000 [0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 1, 1, 1, 1, 1] 10000110 [1, 1, 0, 1, 1, 1, 1, 1]
[0, 0, 0, 1, 0, 0, 0, 0] 00011100 [0, 0, 0, 1, 0, 0, 0, 0]
[1, 1, 1, 0, 1, 1, 1, 1] 10100010 [1, 1, 1, 0, 1, 1, 1, 1]
[0, 0, 0, 0, 1, 0, 0, 0] 00001110 [0, 0, 0, 0, 1, 0, 0, 0]
[1, 1, 1, 1, 0, 1, 1, 1] 10110000 [1, 1, 1, 1, 0, 1, 1, 1]
[0, 0, 0, 0, 0, 1, 0, 0] 00000111 [0, 0, 0, 0, 0, 1, 0, 0]
[1, 1, 1, 1, 1, 0, 1, 1] 10111001 [1, 1, 1, 1, 1, 0, 1, 1]
[0, 0, 0, 0, 0, 0, 1, 0] 00000011 [0, 0, 0, 0, 0, 0, 1, 0]
[1, 1, 1, 1, 1, 1, 0, 1] 10111101 [1, 1, 1, 1, 1, 1, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 1] 00000001 [0, 0, 0, 0, 0, 0, 0, 1]
[1, 1, 1, 1, 1, 1, 1, 0] 10111111 [1, 1, 1, 1, 1, 1, 1, 0]

State Recovery in libgf2

How does libgf2 — the Python module for computation in \( GF(2^n) \) which no one else actually uses — manage state recovery?

State recovery is available in libgf2.util.state_from_reversed_output:

def state_from_reversed_output(polycoeffs, rbits):
    '''
    Compute LFSR state from polynomial coefficients 
    and a reversed sequence of output bits

      polycoeff: polynomial coefficient in bit vector form
      rbits:     any iterable that represents a reversed sequence of output bits 
    '''
    s = 0
    # run backwards until the state has no unknown bits
    for bit in rbits:
        if bit != 0:
            s ^= polycoeffs
        s >>= 1
    return s 

The implementation is essentially equivalent to the last method above, state_recovery4 (LFSR shifting method B), but instead of working with GF2Element objects we just deal with integers directly

This is slightly optimized when compared with state_recovery4, where we set the least significant bit and then execute a “reverse update” of the LFSR (equivalent to multiplying by \( x^{-1} \)); since the “reverse update” amounts to a correction to XOR with the field’s characteristic polynomial if the least significant bit is 1, and then a shift-right, the least significant bit previously present in the state s is irrelevant, and we can just do this correction step based on each of the output bits.

Here’s an example:

from libgf2.util import state_from_reversed_output

y = H25.wrap(1)
field = H25
n = field.degree
fieldmask = (1 << n) - 1
print "note: output has least significant bit = newest output bit"
output = 0
for _ in xrange(20):
    output = (output << 1) & fieldmask | ((y << (n-1)).coeffs >> (n-1))
    # just to double-check that we're not cheating, compare against gather():
    output_bits = gather(y, n)
    assert output == sum(b << (n-1-k) for k,b in enumerate(output_bits))
    print 'state {0:05b} -> output {1:05b} -> recovered state {2:05b}'.format(
                y.coeffs,
                output,
                state_from_reversed_output(field.coeffs, output_bits[::-1]))
    y <<= 1
note: output has least significant bit = newest output bit
state 00001 -> output 00001 -> recovered state 00001
state 00010 -> output 00010 -> recovered state 00010
state 00100 -> output 00100 -> recovered state 00100
state 01000 -> output 01001 -> recovered state 01000
state 10000 -> output 10010 -> recovered state 10000
state 00101 -> output 00101 -> recovered state 00101
state 01010 -> output 01011 -> recovered state 01010
state 10100 -> output 10110 -> recovered state 10100
state 01101 -> output 01100 -> recovered state 01101
state 11010 -> output 11001 -> recovered state 11010
state 10001 -> output 10011 -> recovered state 10001
state 00111 -> output 00111 -> recovered state 00111
state 01110 -> output 01111 -> recovered state 01110
state 11100 -> output 11111 -> recovered state 11100
state 11101 -> output 11110 -> recovered state 11101
state 11111 -> output 11100 -> recovered state 11111
state 11011 -> output 11000 -> recovered state 11011
state 10011 -> output 10001 -> recovered state 10011
state 00011 -> output 00011 -> recovered state 00011
state 00110 -> output 00110 -> recovered state 00110

That’s pretty much the end of the story. State recovery is easy and we don’t have to resort to matrices after all.

Well then, can’t we throw away matrices and rid ourselves of those beloved historical dregs?

Are there any LFSR-related problems that do require matrix algebra, and can’t be solved using straightforward bit manipulation or finite fields? I’m not sure. The only one that comes to mind is testing sequences for linear independence: if you have some sequences from outputs of an \( N \)-bit LFSR, which are \( y_1[k], y_2[k], y_3[k], \ldots, y_{m-1}[k] \) and another sequence \( y_m[k] \), can you express \( y_m[k] \) in terms of an XOR of some subset of the other sequences? This is easy with matrices; you construct a matrix of the first N terms of all the sequences:

$$ M_m = \begin{bmatrix} y_1[0] & y_1[1] & y_1[2] & \ldots & y_1[N-1] \cr y_2[0] & y_2[1] & y_2[2] & \ldots & y_2[N-1] \cr y_3[0] & y_3[1] & y_3[2] & \ldots & y_3[N-1] \cr \vdots & \vdots & \vdots & \ddots & \vdots \cr y_m[0] & y_m[1] & y_m[2] & \ldots & y_m[N-1] \cr \end{bmatrix} $$

Then you find the rank of \( M_m \) and compare it with the rank of \( M_{m-1} \) (the same matrix with the last row removed). The rank tells you how many rows are linearly independent, so if the rank of \( M_m \) and \( M_{m-1} \) is the same, then the \( y_m[k] \) sequence doesn’t have any new information and you can express it in terms of the other sequences, using more matrix algebra to figure out which ones to XOR together. If the rank of \( M_m \) is one more than the rank of \( M_{m-1} \), then \( y_m[k] \) can’t be expressed in this way.

Brute force. That’s linear algebra for you.

There’s probably some convoluted way to do the same thing with the algebra of finite fields, but I have no idea how, and even if someone showed me, I don’t think I would like it.

Wrapup

We discussed matrix representations for an LFSR, in particular how multiplication by the companion matrix \( C \) of the LFSR’s characteristic polynomial is equivalent to multiplication by \( x \) in the corresponding finite field.

We looked at the problem of time shifts (predicting output bit \( Y[k+d] \) from state vector \( S[k] \)) and saw how it can be solved by computing the parity of \( S[k] \) after a bitmask is applied, where the bitmask can be determined from raising the companion matrix \( C \) to the \( d \)th power — or equivalently, looking at the coefficients of \( x^{n-1} \) in the finite field elements \( x^d, x^{d+1}, \ldots, x^{d+n-1} \).

We looked at the problem of state recovery (predicting state vector \( S[k] \) from the output bits \( Y[k], Y[k+1], \ldots, Y[k+n-1] \)) and saw how it can be solved by using powers of the companion matrix \( C^0, C^1, C^2, \ldots, C^{n-1} \) and then a pair of methods based on the solution of a matrix equation; one used a brute-force matrix solver using scipy.linalg.solve_triangular, and the second was a manual solution based on the structure of a Toeplitz matrix. We also looked at two equivalent methods using bit shifting and LFSR updates.

In both cases we were able to find equivalent non-matrix solutions that require less storage space and execution time. This isn’t always the case, but the matrix representation of an LFSR is one instance where the matrices in question have a lot of structure and redundancy. In such instances, matrices may be helpful for understanding and in performing brute-force computations, but there are usually alternatives that execute more quickly.

Next time we’ll be looking at output decimation.


© 2017 Jason M. Sachs, all rights reserved.



The 2024 Embedded Online Conference
[ - ]
Comment by weetabixharryOctober 15, 2021

Jason,

Firstly, thank you for this fantastic resource. I have occasionally browsed through your articles during the last couple of years to try to strengthen my understanding of LFSRs.

I was recently faced with what I now realize is called the "state recovery" problem and I'm really desperate to understand this better. I arrived at exactly the matrix solution you gave above (sorry for the "historical dregs"!):

$$S[k] = W^{-1}Y[k]$$

I was very dissatisfied that this contained a matrix inverse (or, like you say, not really computing the inverse, but still a bit of heavy lifting). So I did some experimentation and, firstly, the result for Fibonacci LFSRs was quite obvious: $S[k]$ is simply equal to $Y[k]$ (or the flipped version, depending on which direction the LFSR shifts).

For Galois LFSRs (like in your article), the structure of $W^{-1}$ was only a little less obvious. In one case (the opposite LFSR shift direction to your article), $W^{-1}$ appears to be remarkably similar to your $\bar{W}$. I say "appears to be" because I have tested it for some millions of cases, but have not been able to prove it. So I got quite excited when I saw your Toeplitz matrix, until I realized you're not constructing $W^{-1}$ after all.

For the type of LFSR you use in your article, I construct $W^{-1}$ like this in Python:

# This is how I represent H25 in my code
polynomial = [5,2,0]

# Construct inv(W) based on my observations
n = polynomial[0]
Winv = np.eye(n)
for i in range(1,len(polynomial)-1):
    Winv += np.eye(n, k=n-polynomial[i])

Winv = np.fliplr(Winv)

And this produces Winv equal to:

array([[0., 1., 0., 0., 1.],
       [1., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.]])

which is a Hankel matrix, rather similar to W (just reflected through the main antidiagonal?). (For the opposite shift direction, it is Toeplitz).

This is where my head starts spinning in circles. I cannot figure out if or how my result relates to what you have written above. This $W^{-1}$ is equal to a vertically flipped $\bar{W}$ and you solve a systems of equations for a vertically flipped state vector... but somehow I can't connect the dots. Maybe this is just a red herring because $\bar{W}$ (at least in this example) is involutory: $\bar{W}^{-1}=\bar{W}$. Quite possibly this doesn't hold in general.


Could you help to shed any light on this? I'm baffled that $W^{-1}$ seems to have such a simple and consistent structure, but I can't seem to prove it. In the meantime, I'll re- and re-re-read your article, as you do provide numerous "state recovery" methods.

[ - ]
Comment by jms_nhOctober 15, 2021

I'm glad you appreciated the article, and thanks for your detailed comment. I'll try to follow up on it when I get a chance... just wanted to warn you that it may be a while. I'll need to get my head back into the topic of LFSRs and matrices, and it's not right now (again the main reason I wrote these articles was for myself :-) so I could free up some stack space, so to speak, for other matters).

[ - ]
Comment by weetabixharryOctober 15, 2021

Thank you for your reply. I will write something here if I get to the bottom of it. There should be ample information in the article, I just need to sift through it with my thinking hat on...

[ - ]
Comment by weetabixharryOctober 16, 2021
In the end, I believe I was able to prove this structure of $W^{-1}$ by converting the "shifting, method B" described above into matrix expressions. I'll summarize below, but I'm aware I'm most likely talking to myself.

If I write the companion matrix as: \[ C=\left[ \begin{array}{ccccc} 0 & 0 & \cdots & 0 & c_{0} \\ 1 & 0 & \cdots & 0 & c_{1} \\ 0 & 1 & \vdots & \vdots & \vdots \\ \vdots & \cdots & \ddots & 0 & \vdots \\ 0 & \cdots & 0 & 1 & c_{n-1} \end{array} \right] \] (where $c_{0}$ is 1), then to shift backwards I need $C^{-1}$, which I prefer to denote $\bar{C}$ to keep the notation a bit tidier: \[ \bar{C}=C^{-1}=\left[ \begin{array}{ccccc} c_{1} & 1 & 0 & \cdots & 0 \\ c_{2} & 0 & 1 & 0 & \vdots \\ \vdots & \vdots & \ddots & \ddots & 0 \\ c_{n-1} & 0 & \cdots & 0 & 1 \\ c_{n} & 0 & \cdots & \cdots & 0 \end{array} \right] \] (where $c_{n}$ is 1). Then each "update-insert" operation in "shifting, method B" can be implemented like this: \[ \underline{S}_{k}=\bar{C}_{0}\underline{S}_{k+1}+\underline{Y}_{k-1} \] where $\bar{C}_{0}$ is defined as $\bar{C}$ with its first row set to zeros. And I have defined $\underline{Y}_{k}=[Y_{k},0,0,\dots ,0]^{T}$. This just allows us to do every update (except for the last one) in a way that zeroes the LSB of the state knowledge, then overwrites it with the next $Y_{k}$ bit. Expanding terms, for example for the $n=5$ case, we get: \[ \underline{S}_{0}=\bar{C}\left( \bar{C}_{0}\left( \bar{C}_{0}\left( \bar{C} _{0}\left( \bar{C}_{0}\underline{Y}_{4}+\underline{Y}_{3}\right) +\underline{ Y}_{2}\right) +\underline{Y}_{1}\right) +\underline{Y}_{0}\right) \] (note that the last update uses $\bar{C}$ and not $\bar{C}_{0}$). There are then some massive simplifications because we only care about the first column of $\bar{C}_{0}^{k}$ (because only the first element of $\underline{Y} _{k}$ can be nonzero). And it's quite easy to see that the first column of $ \bar{C}^{k}$ has a very simple structure (a zero, followed by shift_up_by_k$\{[c_{2},c_{3},\dots ,c_{n}]^{T}\}$). This leads to: \[ \underline{S}_{0}=\bar{C}\left( \left[ \begin{array}{c} 0 \\ c_{n} \\ 0 \\ 0 \\ \vdots \\ 0 \end{array} \right] Y_{n-1}+\left[ \begin{array}{c} 0 \\ c_{n-1} \\ c_{n} \\ 0 \\ \vdots \\ 0 \end{array} \right] Y_{n-2}+\dots +\left[ \begin{array}{c} 0 \\ c_{2} \\ c_{3} \\ c_{4} \\ \vdots \\ c_{n} \end{array} \right] Y_{1}+\left[ \begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{array} \right] Y_{0}\right) \] and after some rearrangements: \[ \underline{S}_{0}=\left[ \begin{array}{cccccc} c_{1} & c_{2} & c_{3} & c_{4} & \cdots & c_{n} \\ c_{2} & c_{3} & c_{4} & \cdots & c_{n} & 0 \\ c_{3} & c_{4} & \cdots & c_{n} & 0 & 0 \\ c_{4} & \cdots & c_{n} & 0 & 0 & 0 \\ \vdots & \cdot & \cdot & \vdots & \vdots & \vdots \\ c_{n} & 0 & 0 & 0 & 0 & 0 \end{array} \right] \left[ \begin{array}{c} Y_{0} \\ Y_{1} \\ Y_{2} \\ \vdots \\ \vdots \\ Y_{n-1} \end{array} \right] \] and that "anti-triangular" Hankel matrix matches the Python code I wrote above.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: