# One Clock Cycle Polynomial Math

Error correction codes and cryptographic computations are most easily performed working with $GF(2^n)$ polynomials. By using very special values of $n$ we can build circuits which multiply and square in one clock cycle on an FPGA. These circuits come about by flipping back and forth between a standard polynomial basis and a normal basis representation of elements in $GF(2^n)$.

A normal basis is yet another form of polynomial but instead of adding powers of $\beta$ we add squares of powers: $$a_{n-1}\beta^{2^{n-1}}+a_{n-2}\beta^{2^{n-2}}+...+ a_2\beta^{2^2}+a_1\beta^2+a_0\beta$$ Note that the last term is $\beta^{2^0}$. The first piece of magic to take notice of is that $\beta^{2^n}=\beta$. This comes from Fermat's "little" theorem. The advantage of this representation is that the operation of squaring becomes a simple left rotation - a single clock cycle on an FPGA.

To see why this works, we first notice that when squaring a polynomial in $GF(2^n)$ all the cross terms cancel. The only terms that don't cancel are $a_j\beta^{2^j} \ast a_j\beta^{2^j}=a_j\beta^{2^{j+1}}$. The coefficient $a_j$ is now in front of the term $\beta^{2^{j+1}}$ which is the next term to the left. The term $a_{n-1}$ goes down to $a_0$ due to Fermat's "little" theorem. We have just computed the square of a polynomial using a left rotation by one bit.

A normal basis is "optimal" when it takes the fewest number of terms to compute a multiply. For specific values of $n$, a lot of magic happens which makes the circuits to implement the math exceptionally simple. A "type 1 normal basis" has the following rules: $n+1$ is prime and 2 is a generator for $Z_{n+1}$. The values of $n<500$ for which this is true are: 2, 4, 10, 12,18, 28, 36, 52, 58, 60, 66, 82, 100, 106, 130, 138, 148, 162, 172, 178, 180, 196, 210, 226, 268, 292, 316, 346, 348, 372, 378, 388, 418, 420, 429, 442, 460, 466 and 491.

There is one other optimal normal basis form but the type 1 form has a lot of advantages in an FPGA as well as microcontrollers. The fundamental advantage of type 1 is the ability to flip between a normal basis and a polynomial basis representation for the same value. The second rule is what allows this. Since $2$ is a generator, $2^j$ will be one of the values between $1$ and $n$. For an example, let's look at $n=10$:

j |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |

$2^j \mod 11$ |
1 | 2 | 4 | 8 | 5 | 10 | 9 | 7 | 3 | 6 | 1 |

A polynomial basis (or standard basis) requires we work modulo an irreducible polynomial. By choosing the irreducible polynomial of $\sum_{i=0}^{i=n}\beta^i$ the transform between normal basis and polynomial basis is a simple permutation. Why this choice works is way beyond what I want to discuss, for now we'll just say the mathematicians figured it out a long time ago and we will run with it.

To understand what the above table means let's look at several terms in both bases. $$\begin{array}{c}\beta^{2^0}\rightarrow\beta\\ \beta^{2^1}\rightarrow\beta^2\\ \beta^{2^8}\rightarrow\beta^3\end{array}$$ But something strange happens at $j=5$:

$$\beta^{2^5}\rightarrow\beta^{10}=\beta^9+\beta^8+\beta^7+\beta^6+\beta^5+\beta^4+\beta^3+\beta^2+\beta+1$$ modulo the irreducible polynomial. To map $1 = \beta^0$ into $\beta^{2^j}$ we need to sum over all the $\beta^{2^j}$ terms because only $\beta^{2^5}$ carries the $1$ with it. So we have $$1\rightarrow\sum_{j=0}^{j=n-1}\beta^{2^j}$$

Our polynomial in standard basis can be transformed to normal basis for squaring, then we can use the permutation table to transform back. To multiply, we follow the form described by J. K. Wolf in Discrete Mathematics 106/107 (1992) pg 497: $$\sum_{i=0}^{i=n-1} a_i\beta^i \ast \sum_{j=0}^{j=n-1} b_j\beta^j = \sum_{k=0}^{k=n-1} c_k\beta^k$$ $$ c'_k = \sum_{j=0}^{j=n-1} b_j a_{k-j\mod n+1} \text{ for } k=0\dotso n$$ The fundamental trick here is the coefficient $a_n = 0$ and the term $c_n$ is used to modify the result with $$c_k = c_n + c'_k$$ Setting $a_n=0$ effectively "knocks out" one term in each sum. The extra term $c_n$ actually comes from the use of an extension into $\beta^{n+1} + 1$ as the basis polynomial which is $(\beta+1)$ times the "all ones" irreducible polynomial.

So what does all that mean? The multiplication is just an AND gate, the sums are XOR gates. The last step is a complement of the resulting $c'_k$ if $c_n$ is set, otherwise nothing happens and $c_k = c'_k$. We can clock in the $a_i$ coefficients and $b_j$ coefficients and the result falls out $c_k$. Obviously the clock timing depends on the cascade of gates.

Another approach is to use n clocks to perform the multiply. This drastically reduces the area consumed by the logic. On each clock $j$, if $b_j$ is set, XOR the $a$ register with the $c$ register, then rotate $a$ left. On clock n, use $c_n$ to complement the output if $c_n=1$. Note that $a$ and $c$ are $n+1$ bits wide and the bit $a_n$ is set to zero. This $0$ rotates around and "takes out" the correct bit each time. On a microcontroller, this is the most sensible way to do the computation.

The point of all this is speed. By using the right size field, polynomial multiplies can be computed in a single clock cycle on an FPGA. The same field which allows this also can be represented in a normal basis which makes squaring a polynomial possible in one clock cycle as well. Switching between the two representations is a simple permutation (plus complement if a certain bit is set). Everything else we need to do can be done with these basic functions. If you need to go fast, then one of the above listed field sizes is the best place to start.

**Previous post by Mike :**

Elliptic Curve Cryptography

**Next post by Mike :**

Polynomial Inverse

## Comments:

- Comments
- Write a Comment Select to add a comment

The point is you are going way, way too deep. This is trivially simple. As you point out $(a+b)^2$ has $2ab$ as a cross term. But $2\mod 2 = 0, so the cross term is 0. The other way to describe it is that you get a cross term from each group and when you add them together you get 0. That?s why XOR works as the sum of terms. I?ve always seen it written as a "field over $GF(2^n)$" , and they always have coefficients in $GF(2)$. I suppose to be correct they should say "an extension field over $GF(2^n)$". Otherwise, XOR doesn't work as the summation operator.

we have :

a^2= 0x6+4x4+1+0x5+0x3+3x2=4x4+3x2+1 , 4x4 means 4*x^4 etc.

b^2=2x4 + 5x3 + 4x2

a*b= 0(x^5) + 0(x^4) + 1(x^4) +3(x^3) + 3(x^2)+2x

2*a*b= 2(x^4) +6(x^3) + 6(x^2)+4x

a^2+2*a*b+b^2= x4+4x3+6x2+4x+1

=======================================

a+b=7x3+x2+2x+1

(a+b)^2=(a+b)*(a+b)= (7x3+x2+2x+1 ) * ( 7x3+x2+2x+1)=0x6+ 7x5 + 0x4 + 7x3 + 7x5 + x4 + 2x3 + x2 + 0x4 + + 2x3 + 4x2 + 2x + 7x3 +x2 + 2x + 1=

=x4 + 4x3 + 6x2 + 4x + 1

so in GF(2^3) (a+b)^2 is not equal to a^2 + b^2 . It will be happened (a+b)^2=a^2+b^2 at the case where the multiplication (to ensure that the field is closed undrer multiplication , no coefficient longer than 3 bits ) was defined and moduloP(x) , where P(x) is a prime polynomial in such a way that 2*a*b moduloP(x)=0.

Given the polynomial x2+1 in G(8) we will calculate (x2+1)^2 using multiplication as in G(2) and addition as XOR . Multiplication is based on P(x) = x3+x2+1 to ensure coefficients of 3 bits long .We have:

2X2+1 = (2 0 1)

x 2 0 1

2 0 1

-----------------

2 0 1

4 0 2 0 0

-----------------------------

= 4 0 0 0 1

0r 100 000 000 000 001

it seems that the coefficients of normal base will be:

110 000 000 000 000 or 6 0 0 0 0

so, one cyclic left shift gives 4 0 0 0 1.

How it is calculated the 6 0 0 0 0 using a normal base?

Ill try it. Interesting.

Thanks.

2 0 1

-----------------

2 0 1

4 0 2 0 0

-----------------------------

= 4 0 0 0 1

element normal base

000 000

001 100

010 010

011 110

100 111

101 011

110 101

111 001

Primitive Normal Bases for Finite Fields

H. W. Lenstra, Jr. and R. J. Schoof

Mathematics of Computation

Vol. 48, No. 177 (Jan., 1987), pp. 217-231

which proves "any finite extension of a finite field has a normal basis consisting of primitive roots." So there _is_ a normal basis over GF(8), but I don't know how you map it back to a standard basis (without reading that reference!).

I like your order, the square of each element goes to the symmetric position.

A root ? of p(x)=0 is the ?=010.

The set of the distinct roots of p(x) is:

B={ ?, ?2, ?4 } , ?^4= ?*?3=?2+?+1 . The elements of the set B are linearly independent and so they are a normal basis. And ?=010 and ?^2=100 and b^3=101mod(p(x) and ?^4=111 mod(p(x))

We have now:

Element normal basis representation

000 000

001 111

010 001

011 110

100 010

101 101

100 011

111 100

http://faculty.washington.edu/manisoma/ee540/EE540finite.pdf

Amela Mutarovic-Ribic, University of Sarajevo

Peng Ning , Yiqun Lisa Yin North Carolina State University

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.