Jay Taylor's notes
back to listing indexReed-Solomon error-correcting code decoder
[web search]
Preliminaries
The Reed-Solomon procedures take place within the framework of a user-chosen field FF. The field is usually GF(28)GF(28) for convenient byte-oriented processing on computers, but it could instead be GF(24)GF(24), GF(212)GF(212), Z73Z73, etc. We need a primitive element / generator αα for the field. The generator must be such that the values {α0,α1,α2,...,α|F|−2}{α0,α1,α2,...,α|F|−2} are all unique and non-zero; hence the powers of αα generate all the non-zero elements of the field FF. (|F||F| is the size of the field, i.e. the total number of distinct elements/values.)
We choose kk to be the message length. kk is an integer such that 1≤k<|F|1≤k<|F|. Each message to be encoded is a sequence/block of kk values from the field FF. For example we might choose k=25k=25 so that each codeword conveys 2525 payload (non-redundant) values. (The case k=0k=0 is obviously degenerate because it means there is no useful information to convey.)
We choose mm to be the number of error correction values by which to expand the message. mm is an integer such that 1≤m<|F|−k1≤m<|F|−k. (The case m=0m=0 is degenerate because it means no RS ECC is added, so the entire message has no protection whatsoever.) Note that when a message is expanded with mm error correction values, the expanded message (a.k.a. codeword) can tolerate up to ⌊m/2⌋⌊m/2⌋ errors and still be decoded perfectly. For example, adding 6 EC values will allow us to fix any codeword with up to 3 erroneous values.
We define n=k+mn=k+m to be the block size / codeword length after encoding. Note that nn is a positive integer that satisfies 2≤n<|F|2≤n<|F|. So if we want big blocks with a lot of error-correcting capability, then we need a sufficiently large field as the foundation.
Systematic encoder
Reed-Solomon error-correcting codes come in a number of flavors, of equivalent error-correcting power but different pragmatic handling. The variant that we use is the BCH view with systematic encoding, which means that the original message is treated as a sequence of coefficients for a polynomial, and the codeword after encoding is equal to the message with some error-correcting data appended to it.
Define the generator polynomial based on mm and αα:
g(x)=∏i=0m−1(x−αi)=(x−α0)(x−α1)⋯(x−αm−1).g(x)=∏i=0m−1(x−αi)=(x−α0)(x−α1)⋯(x−αm−1).
This polynomial has degree mm, and its leading coefficient is equal to 11.
Suppose the original message is the sequence of kk values (M0,M1,...,Mk−1)(M0,M1,...,Mk−1), where each MiMi is an element of field FF. Define the original message polynomial by simply using the values as monomial coefficients:
M(x)=∑i=0k−1Mixi=M0x0+M1x1+⋯+Mk−1xk−1.M(x)=∑i=0k−1Mixi=M0x0+M1x1+⋯+Mk−1xk−1.
Define and calculate the Reed-Solomon codeword being sent as the message shifted up minus the message polynomial modulo the generator polynomial:
s(x)=M(x)xm−[(M(x)xm) mod g(x)].s(x)=M(x)xm−[(M(x)xm) mod g(x)].
Note that the remainder polynomial [(M(x)xm) mod g(x)][(M(x)xm) mod g(x)] has a degree of m−1m−1 or less, so the monomial terms of the remainder don’t interact with the terms of (M(x)xm)(M(x)xm). Overall, s(x)s(x) has degree n−1n−1 or less.
By construction, the sent codeword polynomial s(x)s(x) has the property that s(x)≡0 mod g(x)s(x)≡0 mod g(x); this will be useful shortly in the decoder.
We encode s(x)s(x) into a sequence of values in the straightforward way by breaking it up into nn monomial coefficients:
s(x)=∑i=0n−1sixi=s0x0+s1x1+⋯+sn−1xn−1.s(x)=∑i=0n−1sixi=s0x0+s1x1+⋯+sn−1xn−1.
The codeword we transmit is simply the sequence of nn values (s0,s1,…,sn−1)(s0,s1,…,sn−1), where each value sisi is an element of the field FF.
Peterson-Gorenstein-Zierler decoder
Calculating syndromes
Suppose the codeword we received is (r0,r1,…,rn−1)(r0,r1,…,rn−1), where each value is an element of the field FF. This is known. We defined the received codeword polynomial straightforwardly:
r(x)=∑i=0n−1rixi=r0x0+r1x1+⋯+rn−1xn−1.r(x)=∑i=0n−1rixi=r0x0+r1x1+⋯+rn−1xn−1.
On the receiving side here, we don’t know the sent values s0,s1,…,sn−1s0,s1,…,sn−1, but will go ahead and define the error values anyway. Let e0=r0−s0e0=r0−s0, e1=r1−s1e1=r1−s1, ……, en−1=rn−1−sn−1en−1=rn−1−sn−1. Define the error polynomial straightforwardly (again, we don’t know its value right now):
e(x)=r(x)−s(x)=∑i=0n−1eixi=e0x0+e1x1+⋯+en−1xn−1.e(x)=r(x)−s(x)=∑i=0n−1eixi=e0x0+e1x1+⋯+en−1xn−1.
Now for some actual math: Define the mm syndrome values for 0≤i<m0≤i<m, by evaluating the received codeword polynomial at various powers of the generator:
Si=r(αi)=s(αi)+e(αi)=0+e(αi)=e(αi)=e0α0i+e1α1i+…+en−1α(n−1)i.Si=r(αi)=s(αi)+e(αi)=0+e(αi)=e(αi)=e0α0i+e1α1i+…+en−1α(n−1)i.
This works because by construction, s(αi)=0s(αi)=0 for 0≤i<m0≤i<m, which is because s(x)s(x) is divisible by (x−α0)(x−α1)⋯(x−αm−1)(x−α0)(x−α1)⋯(x−αm−1). Thus we see that the syndromes only depend on the errors that were added to the transmitted codeword, and don’t depend at all on the value of the transmitted codeword or the original message.
If all the syndrome values are zero, then the codeword is already correct, there is nothing to fix, and we are done.
We can show all of these mm syndrome equations explicitly:
⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪S0S1⋯Sm−1=e0α0×0+e1α1×0+⋯+en−1α(n−1)×0.=e0α0×1+e1α1×1+⋯+en−1α(n−1)×1.=e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1).{S0=e0α0×0+e1α1×0+⋯+en−1α(n−1)×0.S1=e0α0×1+e1α1×1+⋯+en−1α(n−1)×1.⋯Sm−1=e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1).
And rewrite this linear system as a matrix:
⎡⎣⎢⎢⎢⎢⎢e0α0×0+e1α1×0+⋯+en−1α(n−1)×0e0α0×1+e1α1×1+⋯+en−1α(n−1)×1⋮e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1)⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[e0α0×0+e1α1×0+⋯+en−1α(n−1)×0e0α0×1+e1α1×1+⋯+en−1α(n−1)×1⋮e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1)]=[S0S1⋮Sm−1].
And factorize the matrix:
⎡⎣⎢⎢⎢⎢⎢α0×0α0×1⋮α0(m−1)α1×0α1×1⋮α1(m−1)⋯⋯⋱⋯α(n−1)×0α(n−1)×1⋮α(n−1)(m−1)⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢e0e1⋮en−1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[α0×0α1×0⋯α(n−1)×0α0×1α1×1⋯α(n−1)×1⋮⋮⋱⋮α0(m−1)α1(m−1)⋯α(n−1)(m−1)][e0e1⋮en−1]=[S0S1⋮Sm−1].
Finding error locations
Choose νν (Greek lowercase nu) as the number of errors to try to find. We require 1≤ν≤⌊m/2⌋1≤ν≤⌊m/2⌋. Unless there are time or space constraints, it is best to set νν as large as possible to catch as many errors as the error-correcting code allows.
Let’s pretend we know the νν error locations as I0,I1,…,Iν−1I0,I1,…,Iν−1. This is an orderless set of unique indexes into the received codeword of nn values, so each element satisfies 0≤Ii<n0≤Ii<n.
The significance of this set/sequence of indexes IiIi is that the error values at these indexes may be non-zero, but all other error values must be zero. In other words, eI0eI0, eI1eI1, ……, eIν−1eIν−1 can each be any value (possibly zero), but the eiei values at other indexes must be zero.
Define some new variables for old values, but based on the error location indexes IiIi, for 0≤i<ν0≤i<ν:
XiYi=αIi.=eIi.Xi=αIi.Yi=eIi.
Because we know all other eiei values are zero, we can substitute the new variables and rewrite the system of syndrome equations as follows:
⎡⎣⎢⎢⎢⎢⎢⎢X00X10⋮Xm−10X01X11⋮Xm−11⋯⋯⋱⋯X0ν−1X1ν−1⋮Xm−1ν−1⎤⎦⎥⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢Y0Y1⋮Yν−1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[X00X10⋯Xν−10X01X11⋯Xν−11⋮⋮⋱⋮X0m−1X1m−1⋯Xν−1m−1][Y0Y1⋮Yν−1]=[S0S1⋮Sm−1].
(At this point we still don’t know any of the XiXi or YiYi values. However there is a clever multi-step procedure that will reveal them.)
Define the error locator polynomial based on the unknown XiXi variables:
Λ(x)=∏i=0ν−1(1−Xix)=1+Λ1x+Λ2x2+⋯+Λνxν.Λ(x)=∏i=0ν−1(1−Xix)=1+Λ1x+Λ2x2+⋯+Λνxν.
(In other words, after all the factors are multiplied and expanded, the polynomial Λ(x)Λ(x) has the sequence of ν+1ν+1 monomial coefficients (1,Λ1,Λ2,…,Λν)(1,Λ1,Λ2,…,Λν).)
By construction we know that for each 0≤i<ν0≤i<ν, we have:
0=Λ(X−1i)=1+Λ1X−1i+Λ2X−2i+⋯+ΛνX−νi.0=Λ(Xi−1)=1+Λ1Xi−1+Λ2Xi−2+⋯+ΛνXi−ν.
The polynomial is zero at these points because the product contains the factor (1−XiX−1i)=1−1=0(1−XiXi−1)=1−1=0.
For 0≤i<ν0≤i<ν and arbitrary j∈Zj∈Z, let’s multiply all sides of the equation by YiXj+νiYiXij+ν:
(YiXj+νi)0=(YiXj+νi)Λ(X−1i)=(YiXj+νi)(1+Λ1X−1i+Λ2X−2i+⋯+ΛνX−νi).0=YiXj+νiΛ(X−1i)=YiXj+νi+Λ1YiXj+ν−1i+Λ2YiXj+ν−2i+⋯+ΛνYiXji.(YiXij+ν)0=(YiXij+ν)Λ(Xi−1)=(YiXij+ν)(1+Λ1Xi−1+Λ2Xi−2+⋯+ΛνXi−ν).0=YiXij+νΛ(Xi−1)=YiXij+ν+Λ1YiXij+ν−1+Λ2YiXij+ν−2+⋯+ΛνYiXij.
Now sum this equation over our full range of ii values:
0=∑i=0ν−1YiXj+νiΛ(X−1i)=∑i=0ν−1(YiXj+νi+Λ1YiXj+ν−1i+Λ2YiXj+ν−2i+⋯+ΛνYiXji)=(∑i=0ν−1YiXj+νi)+Λ1(∑i=0ν−1YiXj+ν−1i)+Λ2(∑i=0ν−1YiXj+ν−2i)+⋯+Λν(∑i=0ν−1YiXji)=Sj+ν+Λ1Sj+ν−1+Λ2Sj+ν−2+⋯+ΛνSj.0=∑i=0ν−1YiXij+νΛ(Xi−1)=∑i=0ν−1(YiXij+ν+Λ1YiXij+ν−1+Λ2YiXij+ν−2+⋯+ΛνYiXij)=(∑i=0ν−1YiXij+ν)+Λ1(∑i=0ν−1YiXij+ν−1)+Λ2(∑i=0ν−1YiXij+ν−2)+⋯+Λν(∑i=0ν−1YiXij)=Sj+ν+Λ1Sj+ν−1+Λ2Sj+ν−2+⋯+ΛνSj.
By rearranging terms, this implies the following, which is valid for 0≤j<ν0≤j<ν:
ΛνSj+Λν−1Sj+1+⋯+Λ1Sj+ν−1=−Sj+ν.ΛνSj+Λν−1Sj+1+⋯+Λ1Sj+ν−1=−Sj+ν.
Substituting all valid values of jj, we can form a system of vv linear equations:
⎧⎩⎨⎪⎪⎪⎪⎪⎪ΛνS0+Λν−1S1+⋯+Λ1Sν−1ΛνS1+Λν−1S2+⋯+Λ1Sν⋯ΛνSν−1+Λν−1Sν+⋯+Λ1S2ν−2=−Sν.=−Sν+1.=−S2ν−1.{ΛνS0+Λν−1S1+⋯+Λ1Sν−1=−Sν.ΛνS1+Λν−1S2+⋯+Λ1Sν=−Sν+1.⋯ΛνSν−1+Λν−1Sν+⋯+Λ1S2ν−2=−S2ν−1.
We can rewrite and factorize this system into matrices and vectors:
⎡⎣⎢⎢⎢⎢⎢S0S1⋮Sν−1S1S2⋮Sν⋯⋯⋱⋯Sν−1Sν⋮S2ν−2⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢ΛνΛν−1⋮Λ1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢−Sν−Sν+1⋮−S2ν−1⎤⎦⎥⎥⎥⎥[S0S1⋯Sν−1S1S2⋯Sν⋮⋮⋱⋮Sν−1Sν⋯S2ν−2][ΛνΛν−1⋮Λ1]=[−Sν−Sν+1⋮−S2ν−1]
Now we finally have something that can be solved. Put the above coefficients into a ν×(ν+1)ν×(ν+1) augmented matrix and run it through Gauss-Jordan elimination. If the system of linear equations is inconsistent, then there are too many errors in the codeword and they cannot be repaired at the moment. If νν is not at the maximum possible value, then it might be possible to fix the codeword by re-running the procedure with a larger νν. Otherwise the codeword cannot be fixed in any way at all.
We need to take special care if the linear system is consistent but under-determined. This happens when the codeword contains fewer than νν errors, because any of the locations where the error value is zero could be selected as a virtual “error”. Each different set of error location indexes {Ii|0≤i<ν}{Ii|0≤i<ν} (unknown right now) would produce a different error locator polynomial Λ(x)Λ(x). For example if the set of true error locations is {2,5}{2,5} and we want to find exactly 3 error locations, then {0,2,5}{0,2,5}, {2,4,5}{2,4,5}, etc. are all equally valid solutions.
The key insight is that we only need some particular solution to the linear system. We don’t care about parametrizing the space of all solutions or anything else. One approach is to treat all the dependent variables as zero. When we scan each row of the matrix in reduced row echelon form (RREF), we look at the column of the leftmost non-zero coefficient. If the column is rightmost, then the linear system is inconsistent. Otherwise the ii-th column corresponds to the ii-th variable in the linear system, which corresponds to the coefficient Λν−iΛν−i. By setting the dependent variables (the columns without pivots) to zero, we don’t need to adjust the values of any other variables.
Once we obtain the coefficients Λ1,Λ2,…,ΛνΛ1,Λ2,…,Λν, we can evaluate the polynomial Λ(x)Λ(x) at any point we want. Plug in the values x=α−ix=α−i for 0≤i<n0≤i<n and check to see if Λ(α−i)=0Λ(α−i)=0. For the solutions found, put the ii values into the variables I0I0, I1I1, etc.
Note that we may find any number of solutions, anywhere between 00 and nn inclusive. If the number of solutions found is more than νν, then it is generally impossible to recover from the errors. If the number of solutions found is less than νν, then we simply redefine νν to this lower number for the remaining part of the decoder algorithm, and delete the higher-numbered IiIi variables. The solutions can be found and saved into the IiIi variables in any order.
It’s unnecessary to test values of ii that are at least nn, because errors cannot occur outside of valid indexes into the codeword. However it’s possible to find solutions of Λ(α−i)=0Λ(α−i)=0 where i≥ni≥n. Not only can these solutions be ignored, they imply that the error-correcting capability has been exceeded because we can’t derive self-consistent information about where these errors are located. Thus this situation counts as an early failure.
Calculating error values
At this point νν might have been redefined as a smaller number, and we know the error location indexes I0I0, I1I1, ..., Iν−1Iν−1. Because of this, we know all the values Xi=αIiXi=αIi for 0≤i<ν0≤i<ν.
Since we know the XiXi values and SiSi values, we can solve one of the earlier equations for the vector of values Yi=eIiYi=eIi to obtain the error values/magnitudes:
⎡⎣⎢⎢⎢⎢⎢⎢X00X10⋮Xm−10X01X11⋮Xm−11⋯⋯⋱⋯X0ν−1X1ν−1⋮Xm−1ν−1⎤⎦⎥⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢Y0Y1⋮Yν−1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[X00X10⋯Xν−10X01X11⋯Xν−11⋮⋮⋱⋮X0m−1X1m−1⋯Xν−1m−1][Y0Y1⋮Yν−1]=[S0S1⋮Sm−1].
We simply run a Gauss-Jordan elimination algorithm here. If the linear system is consistent and independent, then a unique solution exists and we are going to finish successfully. Otherwise the system is inconsistent so it is impossible to satisfy all the syndrome constraints, or the system is dependent/under-determined so there is no unique solution (even though one is required).
With the error locations IiIi and error values YiYi known, we can attempt to fix the received codeword. Let the repaired codeword polynomial be r′(x)=r′0x0+r′1x1+…r′n−1xn−1r′(x)=r0′x0+r1′x1+…rn−1′xn−1. For 0≤i<ν0≤i<ν, we set the coefficient r′Ii=rIi−Yi=rIi−eIirIi′=rIi−Yi=rIi−eIi. For each index ii where 0≤i<n0≤i<n and ii is not present in the set {Ij|0≤j<ν}{Ij|0≤j<ν}, we simply copy the value r′i=riri′=ri (i.e. don’t change the codeword values at locations not identified as errors).
By design, the repaired codeword polynomial r′(x)r′(x) will have all-zero syndrome values – because if not, then the matrix-solving process would have identified that the linear system is inconsistent and have no solution. We can still check syndrome codes for paranoia, but here we’re done.
This decoded codeword is the best guess based on the received codeword. If the number of errors introduced into the codeword is at most νν, then the decoding process is guaranteed to succeed and yield the original message. Otherwise for a received codeword with too many errors, all outcomes are possible – the decoding process might explicitly indicate a failure (most likely), a valid message is decoded but it mismatches the original message (occasionally), or the correct message is recovered (very unlikely).
Notes
Time complexity
The Reed-Solomon procedures take place within the framework of a user-chosen field FF. The field is usually GF(28)GF(28) for convenient byte-oriented processing on computers, but it could instead be GF(24)GF(24), GF(212)GF(212), Z73Z73, etc. We need a primitive element / generator αα for the field. The generator must be such that the values {α0,α1,α2,...,α|F|−2}{α0,α1,α2,...,α|F|−2} are all unique and non-zero; hence the powers of αα generate all the non-zero elements of the field FF. (|F||F| is the size of the field, i.e. the total number of distinct elements/values.)
We choose kk to be the message length. kk is an integer such that 1≤k<|F|1≤k<|F|. Each message to be encoded is a sequence/block of kk values from the field FF. For example we might choose k=25k=25 so that each codeword conveys 2525 payload (non-redundant) values. (The case k=0k=0 is obviously degenerate because it means there is no useful information to convey.)
We choose mm to be the number of error correction values by which to expand the message. mm is an integer such that 1≤m<|F|−k1≤m<|F|−k. (The case m=0m=0 is degenerate because it means no RS ECC is added, so the entire message has no protection whatsoever.) Note that when a message is expanded with mm error correction values, the expanded message (a.k.a. codeword) can tolerate up to ⌊m/2⌋⌊m/2⌋ errors and still be decoded perfectly. For example, adding 6 EC values will allow us to fix any codeword with up to 3 erroneous values.
We define n=k+mn=k+m to be the block size / codeword length after encoding. Note that nn is a positive integer that satisfies 2≤n<|F|2≤n<|F|. So if we want big blocks with a lot of error-correcting capability, then we need a sufficiently large field as the foundation.
Reed-Solomon error-correcting codes come in a number of flavors, of equivalent error-correcting power but different pragmatic handling. The variant that we use is the BCH view with systematic encoding, which means that the original message is treated as a sequence of coefficients for a polynomial, and the codeword after encoding is equal to the message with some error-correcting data appended to it.
Define the generator polynomial based on mm and αα:
g(x)=∏i=0m−1(x−αi)=(x−α0)(x−α1)⋯(x−αm−1).g(x)=∏i=0m−1(x−αi)=(x−α0)(x−α1)⋯(x−αm−1).
This polynomial has degree mm, and its leading coefficient is equal to 11.
Suppose the original message is the sequence of kk values (M0,M1,...,Mk−1)(M0,M1,...,Mk−1), where each MiMi is an element of field FF. Define the original message polynomial by simply using the values as monomial coefficients:
M(x)=∑i=0k−1Mixi=M0x0+M1x1+⋯+Mk−1xk−1.M(x)=∑i=0k−1Mixi=M0x0+M1x1+⋯+Mk−1xk−1.
Define and calculate the Reed-Solomon codeword being sent as the message shifted up minus the message polynomial modulo the generator polynomial:
s(x)=M(x)xm−[(M(x)xm) mod g(x)].s(x)=M(x)xm−[(M(x)xm) mod g(x)].
Note that the remainder polynomial [(M(x)xm) mod g(x)][(M(x)xm) mod g(x)] has a degree of m−1m−1 or less, so the monomial terms of the remainder don’t interact with the terms of (M(x)xm)(M(x)xm). Overall, s(x)s(x) has degree n−1n−1 or less.
By construction, the sent codeword polynomial s(x)s(x) has the property that s(x)≡0 mod g(x)s(x)≡0 mod g(x); this will be useful shortly in the decoder.
We encode s(x)s(x) into a sequence of values in the straightforward way by breaking it up into nn monomial coefficients:
s(x)=∑i=0n−1sixi=s0x0+s1x1+⋯+sn−1xn−1.s(x)=∑i=0n−1sixi=s0x0+s1x1+⋯+sn−1xn−1.
The codeword we transmit is simply the sequence of nn values (s0,s1,…,sn−1)(s0,s1,…,sn−1), where each value sisi is an element of the field FF.
Suppose the codeword we received is (r0,r1,…,rn−1)(r0,r1,…,rn−1), where each value is an element of the field FF. This is known. We defined the received codeword polynomial straightforwardly:
r(x)=∑i=0n−1rixi=r0x0+r1x1+⋯+rn−1xn−1.r(x)=∑i=0n−1rixi=r0x0+r1x1+⋯+rn−1xn−1.
On the receiving side here, we don’t know the sent values s0,s1,…,sn−1s0,s1,…,sn−1, but will go ahead and define the error values anyway. Let e0=r0−s0e0=r0−s0, e1=r1−s1e1=r1−s1, ……, en−1=rn−1−sn−1en−1=rn−1−sn−1. Define the error polynomial straightforwardly (again, we don’t know its value right now):
e(x)=r(x)−s(x)=∑i=0n−1eixi=e0x0+e1x1+⋯+en−1xn−1.e(x)=r(x)−s(x)=∑i=0n−1eixi=e0x0+e1x1+⋯+en−1xn−1.
Now for some actual math: Define the mm syndrome values for 0≤i<m0≤i<m, by evaluating the received codeword polynomial at various powers of the generator:
Si=r(αi)=s(αi)+e(αi)=0+e(αi)=e(αi)=e0α0i+e1α1i+…+en−1α(n−1)i.Si=r(αi)=s(αi)+e(αi)=0+e(αi)=e(αi)=e0α0i+e1α1i+…+en−1α(n−1)i.
This works because by construction, s(αi)=0s(αi)=0 for 0≤i<m0≤i<m, which is because s(x)s(x) is divisible by (x−α0)(x−α1)⋯(x−αm−1)(x−α0)(x−α1)⋯(x−αm−1). Thus we see that the syndromes only depend on the errors that were added to the transmitted codeword, and don’t depend at all on the value of the transmitted codeword or the original message.
If all the syndrome values are zero, then the codeword is already correct, there is nothing to fix, and we are done.
We can show all of these mm syndrome equations explicitly:
⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪S0S1⋯Sm−1=e0α0×0+e1α1×0+⋯+en−1α(n−1)×0.=e0α0×1+e1α1×1+⋯+en−1α(n−1)×1.=e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1).{S0=e0α0×0+e1α1×0+⋯+en−1α(n−1)×0.S1=e0α0×1+e1α1×1+⋯+en−1α(n−1)×1.⋯Sm−1=e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1).
And rewrite this linear system as a matrix:
⎡⎣⎢⎢⎢⎢⎢e0α0×0+e1α1×0+⋯+en−1α(n−1)×0e0α0×1+e1α1×1+⋯+en−1α(n−1)×1⋮e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1)⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[e0α0×0+e1α1×0+⋯+en−1α(n−1)×0e0α0×1+e1α1×1+⋯+en−1α(n−1)×1⋮e0α0(m−1)+e1α1(m−1)+⋯+en−1α(n−1)(m−1)]=[S0S1⋮Sm−1].
And factorize the matrix:
⎡⎣⎢⎢⎢⎢⎢α0×0α0×1⋮α0(m−1)α1×0α1×1⋮α1(m−1)⋯⋯⋱⋯α(n−1)×0α(n−1)×1⋮α(n−1)(m−1)⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢e0e1⋮en−1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[α0×0α1×0⋯α(n−1)×0α0×1α1×1⋯α(n−1)×1⋮⋮⋱⋮α0(m−1)α1(m−1)⋯α(n−1)(m−1)][e0e1⋮en−1]=[S0S1⋮Sm−1].
Choose νν (Greek lowercase nu) as the number of errors to try to find. We require 1≤ν≤⌊m/2⌋1≤ν≤⌊m/2⌋. Unless there are time or space constraints, it is best to set νν as large as possible to catch as many errors as the error-correcting code allows.
Let’s pretend we know the νν error locations as I0,I1,…,Iν−1I0,I1,…,Iν−1. This is an orderless set of unique indexes into the received codeword of nn values, so each element satisfies 0≤Ii<n0≤Ii<n.
The significance of this set/sequence of indexes IiIi is that the error values at these indexes may be non-zero, but all other error values must be zero. In other words, eI0eI0, eI1eI1, ……, eIν−1eIν−1 can each be any value (possibly zero), but the eiei values at other indexes must be zero.
Define some new variables for old values, but based on the error location indexes IiIi, for 0≤i<ν0≤i<ν:
XiYi=αIi.=eIi.Xi=αIi.Yi=eIi.
Because we know all other eiei values are zero, we can substitute the new variables and rewrite the system of syndrome equations as follows:
⎡⎣⎢⎢⎢⎢⎢⎢X00X10⋮Xm−10X01X11⋮Xm−11⋯⋯⋱⋯X0ν−1X1ν−1⋮Xm−1ν−1⎤⎦⎥⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢Y0Y1⋮Yν−1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[X00X10⋯Xν−10X01X11⋯Xν−11⋮⋮⋱⋮X0m−1X1m−1⋯Xν−1m−1][Y0Y1⋮Yν−1]=[S0S1⋮Sm−1].
(At this point we still don’t know any of the XiXi or YiYi values. However there is a clever multi-step procedure that will reveal them.)
Define the error locator polynomial based on the unknown XiXi variables:
Λ(x)=∏i=0ν−1(1−Xix)=1+Λ1x+Λ2x2+⋯+Λνxν.Λ(x)=∏i=0ν−1(1−Xix)=1+Λ1x+Λ2x2+⋯+Λνxν.
(In other words, after all the factors are multiplied and expanded, the polynomial Λ(x)Λ(x) has the sequence of ν+1ν+1 monomial coefficients (1,Λ1,Λ2,…,Λν)(1,Λ1,Λ2,…,Λν).)
By construction we know that for each 0≤i<ν0≤i<ν, we have:
0=Λ(X−1i)=1+Λ1X−1i+Λ2X−2i+⋯+ΛνX−νi.0=Λ(Xi−1)=1+Λ1Xi−1+Λ2Xi−2+⋯+ΛνXi−ν.
The polynomial is zero at these points because the product contains the factor (1−XiX−1i)=1−1=0(1−XiXi−1)=1−1=0.
For 0≤i<ν0≤i<ν and arbitrary j∈Zj∈Z, let’s multiply all sides of the equation by YiXj+νiYiXij+ν:
(YiXj+νi)0=(YiXj+νi)Λ(X−1i)=(YiXj+νi)(1+Λ1X−1i+Λ2X−2i+⋯+ΛνX−νi).0=YiXj+νiΛ(X−1i)=YiXj+νi+Λ1YiXj+ν−1i+Λ2YiXj+ν−2i+⋯+ΛνYiXji.(YiXij+ν)0=(YiXij+ν)Λ(Xi−1)=(YiXij+ν)(1+Λ1Xi−1+Λ2Xi−2+⋯+ΛνXi−ν).0=YiXij+νΛ(Xi−1)=YiXij+ν+Λ1YiXij+ν−1+Λ2YiXij+ν−2+⋯+ΛνYiXij.
Now sum this equation over our full range of ii values:
0=∑i=0ν−1YiXj+νiΛ(X−1i)=∑i=0ν−1(YiXj+νi+Λ1YiXj+ν−1i+Λ2YiXj+ν−2i+⋯+ΛνYiXji)=(∑i=0ν−1YiXj+νi)+Λ1(∑i=0ν−1YiXj+ν−1i)+Λ2(∑i=0ν−1YiXj+ν−2i)+⋯+Λν(∑i=0ν−1YiXji)=Sj+ν+Λ1Sj+ν−1+Λ2Sj+ν−2+⋯+ΛνSj.0=∑i=0ν−1YiXij+νΛ(Xi−1)=∑i=0ν−1(YiXij+ν+Λ1YiXij+ν−1+Λ2YiXij+ν−2+⋯+ΛνYiXij)=(∑i=0ν−1YiXij+ν)+Λ1(∑i=0ν−1YiXij+ν−1)+Λ2(∑i=0ν−1YiXij+ν−2)+⋯+Λν(∑i=0ν−1YiXij)=Sj+ν+Λ1Sj+ν−1+Λ2Sj+ν−2+⋯+ΛνSj.
By rearranging terms, this implies the following, which is valid for 0≤j<ν0≤j<ν:
ΛνSj+Λν−1Sj+1+⋯+Λ1Sj+ν−1=−Sj+ν.ΛνSj+Λν−1Sj+1+⋯+Λ1Sj+ν−1=−Sj+ν.
Substituting all valid values of jj, we can form a system of vv linear equations:
⎧⎩⎨⎪⎪⎪⎪⎪⎪ΛνS0+Λν−1S1+⋯+Λ1Sν−1ΛνS1+Λν−1S2+⋯+Λ1Sν⋯ΛνSν−1+Λν−1Sν+⋯+Λ1S2ν−2=−Sν.=−Sν+1.=−S2ν−1.{ΛνS0+Λν−1S1+⋯+Λ1Sν−1=−Sν.ΛνS1+Λν−1S2+⋯+Λ1Sν=−Sν+1.⋯ΛνSν−1+Λν−1Sν+⋯+Λ1S2ν−2=−S2ν−1.
We can rewrite and factorize this system into matrices and vectors:
⎡⎣⎢⎢⎢⎢⎢S0S1⋮Sν−1S1S2⋮Sν⋯⋯⋱⋯Sν−1Sν⋮S2ν−2⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢ΛνΛν−1⋮Λ1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢−Sν−Sν+1⋮−S2ν−1⎤⎦⎥⎥⎥⎥[S0S1⋯Sν−1S1S2⋯Sν⋮⋮⋱⋮Sν−1Sν⋯S2ν−2][ΛνΛν−1⋮Λ1]=[−Sν−Sν+1⋮−S2ν−1]
Now we finally have something that can be solved. Put the above coefficients into a ν×(ν+1)ν×(ν+1) augmented matrix and run it through Gauss-Jordan elimination. If the system of linear equations is inconsistent, then there are too many errors in the codeword and they cannot be repaired at the moment. If νν is not at the maximum possible value, then it might be possible to fix the codeword by re-running the procedure with a larger νν. Otherwise the codeword cannot be fixed in any way at all.
We need to take special care if the linear system is consistent but under-determined. This happens when the codeword contains fewer than νν errors, because any of the locations where the error value is zero could be selected as a virtual “error”. Each different set of error location indexes {Ii|0≤i<ν}{Ii|0≤i<ν} (unknown right now) would produce a different error locator polynomial Λ(x)Λ(x). For example if the set of true error locations is {2,5}{2,5} and we want to find exactly 3 error locations, then {0,2,5}{0,2,5}, {2,4,5}{2,4,5}, etc. are all equally valid solutions.
The key insight is that we only need some particular solution to the linear system. We don’t care about parametrizing the space of all solutions or anything else. One approach is to treat all the dependent variables as zero. When we scan each row of the matrix in reduced row echelon form (RREF), we look at the column of the leftmost non-zero coefficient. If the column is rightmost, then the linear system is inconsistent. Otherwise the ii-th column corresponds to the ii-th variable in the linear system, which corresponds to the coefficient Λν−iΛν−i. By setting the dependent variables (the columns without pivots) to zero, we don’t need to adjust the values of any other variables.
Once we obtain the coefficients Λ1,Λ2,…,ΛνΛ1,Λ2,…,Λν, we can evaluate the polynomial Λ(x)Λ(x) at any point we want. Plug in the values x=α−ix=α−i for 0≤i<n0≤i<n and check to see if Λ(α−i)=0Λ(α−i)=0. For the solutions found, put the ii values into the variables I0I0, I1I1, etc.
Note that we may find any number of solutions, anywhere between 00 and nn inclusive. If the number of solutions found is more than νν, then it is generally impossible to recover from the errors. If the number of solutions found is less than νν, then we simply redefine νν to this lower number for the remaining part of the decoder algorithm, and delete the higher-numbered IiIi variables. The solutions can be found and saved into the IiIi variables in any order.
It’s unnecessary to test values of ii that are at least nn, because errors cannot occur outside of valid indexes into the codeword. However it’s possible to find solutions of Λ(α−i)=0Λ(α−i)=0 where i≥ni≥n. Not only can these solutions be ignored, they imply that the error-correcting capability has been exceeded because we can’t derive self-consistent information about where these errors are located. Thus this situation counts as an early failure.
At this point νν might have been redefined as a smaller number, and we know the error location indexes I0I0, I1I1, ..., Iν−1Iν−1. Because of this, we know all the values Xi=αIiXi=αIi for 0≤i<ν0≤i<ν.
Since we know the XiXi values and SiSi values, we can solve one of the earlier equations for the vector of values Yi=eIiYi=eIi to obtain the error values/magnitudes:
⎡⎣⎢⎢⎢⎢⎢⎢X00X10⋮Xm−10X01X11⋮Xm−11⋯⋯⋱⋯X0ν−1X1ν−1⋮Xm−1ν−1⎤⎦⎥⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢Y0Y1⋮Yν−1⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢S0S1⋮Sm−1⎤⎦⎥⎥⎥⎥[X00X10⋯Xν−10X01X11⋯Xν−11⋮⋮⋱⋮X0m−1X1m−1⋯Xν−1m−1][Y0Y1⋮Yν−1]=[S0S1⋮Sm−1].
We simply run a Gauss-Jordan elimination algorithm here. If the linear system is consistent and independent, then a unique solution exists and we are going to finish successfully. Otherwise the system is inconsistent so it is impossible to satisfy all the syndrome constraints, or the system is dependent/under-determined so there is no unique solution (even though one is required).
With the error locations IiIi and error values YiYi known, we can attempt to fix the received codeword. Let the repaired codeword polynomial be r′(x)=r′0x0+r′1x1+…r′n−1xn−1r′(x)=r0′x0+r1′x1+…rn−1′xn−1. For 0≤i<ν0≤i<ν, we set the coefficient r′Ii=rIi−Yi=rIi−eIirIi′=rIi−Yi=rIi−eIi. For each index ii where 0≤i<n0≤i<n and ii is not present in the set {Ij|0≤j<ν}{Ij|0≤j<ν}, we simply copy the value r′i=riri′=ri (i.e. don’t change the codeword values at locations not identified as errors).
By design, the repaired codeword polynomial r′(x)r′(x) will have all-zero syndrome values – because if not, then the matrix-solving process would have identified that the linear system is inconsistent and have no solution. We can still check syndrome codes for paranoia, but here we’re done.
This decoded codeword is the best guess based on the received codeword. If the number of errors introduced into the codeword is at most νν, then the decoding process is guaranteed to succeed and yield the original message. Otherwise for a received codeword with too many errors, all outcomes are possible – the decoding process might explicitly indicate a failure (most likely), a valid message is decoded but it mismatches the original message (occasionally), or the correct message is recovered (very unlikely).
The encoder is short and simple, and runs in Θ(mk)Θ(mk) time. It is unlikely that the encoder can be significantly improved in conciseness or speed.
The PGZ decoder algorithm described here runs in Θ(m3+mk)Θ(m3+mk) time, assuming we choose the maximum error-correcting capability of ν=⌊m/2⌋ν=⌊m/2⌋. This cubic runtime is not ideal and can be improved to quadratic with other algorithms. The Berlekamp-Massey algorithm can find the error locator polynomial in in Θ(m2)Θ(m2) time, and the Forney algorithm can find the error values also in Θ(m2)Θ(m2) time.
Alternatives to a generator
The algorithm described here uses powers of the generator αα starting at 00, i.e. (α0,α1,…,αm−1)(α0,α1,…,αm−1). Some variants of Reed-Solomon ECC use powers starting at 11, i.e. (α1,α2,…,αm)(α1,α2,…,αm).
In fact, the algorithm doesn’t seem to need powers or a generator at all. It appears to work as long as we can choose mm unique non-zero values in the field FF. For one, this means we can apply RS ECC in infinite fields such as the rational numbers QQ (for pedagogical but not practical purposes), because infinite fields never have a multiplicative generator.
Although I haven’t modified the math and code to show that it works, it should be possible to adjust them to accommodate a set of unique values instead of generator powers. Suppose we have a sequence of mm unique non-zero values named (α0,α1,…,αm−1)(α0,α1,…,αm−1). Then going through the mathematical derivation, we would replace every instance of αiαi with αiαi and it should probably all work out.