Computer Chess Club Archives


Search

Terms

Messages

Subject: Arithmetic coding of chess piece positions

Author: Larry Serflaten

Date: 07:13:17 11/15/99


At the invitation of "Les", I will attempt to describe the Arithmetic coding
technique as it pertains to storing the chess piece positions on a chessboard.
Further down is source code in Visual Basic for those who may want to study
the implementation (so far).

Encoding:

This method uses numbers in the range of 0 to 1, so that, any output from the
encoding process will always be a decimal (fractional) value.  Where that value
falls in the range of 0 to 1, has an infinate number of possibilities, because
we can always add more numbers to the end of the decimal value, (for more
encoded symbols) and still reside between 0 and 1.

Between 0 and 1, the entire list of symbols is assigned a range of values.
These symbols represent the different chess pieces, SPACE, Castle enabled
Rooks, and En passant Pawns, etc.  Reading from the board sequencially
(A8 - H1) the SPACE or other symbol is sent to ecoding depending on what piece
occupies the curent square.  The encoded message would always contain the 64
symbols that indicate what occupies each square.

If, for example, there are 20 (for convenience) symbols, each symbol would map
to 5% or .05 of the entire range (0 to 1).  The assignment might start out as
follows:

  SPACE      .0  -  .049999...
W King       .05 -  .099999...
W Queen      .10 -  .149999...
W Bishop     .15 -  .199999...
W Knight     .20 -  .249999...
W Rook       .25 -  .299999...
...
Black's Turn .90 -  .949999...
End of Data  .95 -  .999999...  (.9999...  Is 'identical' to 1)

Note that we can add 0's after the low range values and still maintain accurate
precision.  Likewise for the upper range values, we can add 9's on to the end
of those numbers and get even more precise results.   This will later allow us
to add more precision, as more and more symbols are encoded.

After encoding the first symbol (W Rook for example) the range is narrowed to
whatever size W Rook's range is and the 20 symbols are remapped into the new
range:

W Rook's range was .25 to .299999..., now all the symbols must fit within that
range and are remapped giving each 5% of the new range:

  SPACE      .2500  -  .252499...
W King       .2525  -  .254999...
W Queen      .2550  -  .257499...
W Bishop     .2575  -  .259999...
W Knight     .2600  -  .262499...

If the next symbol is the W Knight, the new range would be from .26 to .262499.
and all the symbols would be remapped to the new range.  If for example, this
was the last (64th) symbol, the output is the lower value, IE: .26

Decode:

Starting again with the original symbol ranges, we can determine what the first
symbol is by finding out what range the coded number falls into:  .26 falls
between .25 and .29999... which is the range assigned to W Rook.  If we
subtract out W Rook's lower range value, (.26 - .25 = .01) and divide by the
range size (.05) we get a new number (.01 / .05 = .2) which is again matched
against the symbols' ranges.  .2 belongs to W Knight, which was the second
symbol encoded.

Subtracting out W Knight's .2 leaves us with 0, so either a SPACE is the next
symbol, or we're done with decoding.  Using an End Of Data symbol allows us to
be certain where the end is.  Another method could rely on there being exactly
64 symbols....

Comments:

If, after encoding all 64 symbols, the resulting number could be accurately
represented in a (VB) Double type, the actual bit usage could be reduced to
the 64 bits needed to store the Double type.  In acuality, the 64 bits would
be read as one 64 bit binary fraction, where the decimal point is assumed to
be on the left side instead of the right, allowing the 64 bits to represent
the encoded (fractional) value.

The trouble with implementing this method is that the floating point math
packages in computers today are not equipped to handle numbers with many
significant digits.  Some means of working on lengthy numbers is needed and
improvements may include using Byte arrays to store each range value.

For the compression buffs, this method offers better entropy than Huffman
coding in that for a symbol that is repeated often, its 'optimal' bit size
would be somewhat less than one.  Huffman coding would probably assign one bit
for the symbol, which could be 3 or 4 times more than the optimal value.  The
Arithmetic coding method gets closer to achieving the optimal bit sizes and
thus better entropy.   How you figure 'optimal' size and other such intracacies
are beyond my experience, but you can read about them in 'The Data Compression
Book' by Mark Nelson / M & T Books, (algorithms in ANSI C) from which the
following code was adapted.

Source Code:

The demo below uses a form of arithmetic coding using floating point math on a
Double Precision value.  In its current state, it is only good for about 14 or
15 symbols, and sometimes less.

On a new form, add a command button and 3 text boxes.  Using the letters A-S
(T is the End of Data symbol) enter a string into Text1 and click on the
button....

Option Explicit
Dim Model(0 To 20) As Double

Private Sub Command1_Click()
Dim Code As Double
    Code = Encode(CStr(Text1) & "T") 'Appending EOD
    Text2 = CStr(Code)
    Text3 = Decode(Code)
End Sub

Private Sub Form_Load()
Dim i&, k#
    'Store Ranges for A-S (and EOD)
    For i = 0 To 20
        Model(i) = k
        k = k + 0.05  ' = 1/20 of entire (20 symbol) table
    Next i
End Sub

Private Function Encode(DataIn$) As Double
Dim Hi As Double, Lo As Double, Range As Double
Dim i As Byte, sym As Byte

    Hi = 1#
    Lo = 0
    DataIn = UCase$(DataIn)
    For i = 1 To Len(DataIn)
        sym = Asc(Mid$(DataIn, i, 1)) - 64
        'Trap input errors
        If BadInput(sym, i) Then
            Encode = 0.99999
            Exit Function
        End If
        'Symbol OK - Reset new range values
        Range = Hi - Lo
        Hi = Lo + (Range * Model(sym))
        Lo = Lo + (Range * Model(sym - 1))
    Next i
    Encode = Lo
End Function

Private Function Decode(Data As Double) As String
Dim i As Byte, sym As Byte
    Do
        sym = FindSymbol(Data)
        If sym = 20 Then Exit Do 'End Of Data symbol
        'Subtract out the Lo value and divide by the range size
        Data = (Data - Model(sym - 1)) / 0.05
        Decode = Decode & Chr(sym + 64)
    Loop
End Function

Private Function FindSymbol(ByVal X As Double) As Byte
   'Where does X fall on the range of 0 - .9999 ...
    FindSymbol = 1
    Do While Model(FindSymbol) < X
        FindSymbol = FindSymbol + 1
    Loop
End Function

Private Function BadInput(ByVal s, ByVal i) As Boolean
        If s > 20 Then
            MsgBox "Input error"
            'Highlight offensive input!
            With Text1
                .SelStart = i - 1
                .SelLength = 1
                .SetFocus
            End With
            BadInput = True
        End If
End Function



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.