Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Arithmetic coding of chess piece positions

Author: Vincent Diepeveen

Date: 08:21:44 11/15/99

Go up one level in this thread


Cool Arithmetic coding, by far the best way of compressing things,
but very CPU intensive.

What is so far the worst case in a chessposition you
encountered, how many bits do you need for a position at maximum?

Greetings,
Vincent



On November 15, 1999 at 10:13:17, Larry Serflaten wrote:

>At the invitation of "Les", I will attempt to describe the Arithmetic coding
>technique as it pertains to storing the chess piece positions on a chessboard.
>Further down is source code in Visual Basic for those who may want to study
>the implementation (so far).
>
>Encoding:
>
>This method uses numbers in the range of 0 to 1, so that, any output from the
>encoding process will always be a decimal (fractional) value.  Where that value
>falls in the range of 0 to 1, has an infinate number of possibilities, because
>we can always add more numbers to the end of the decimal value, (for more
>encoded symbols) and still reside between 0 and 1.
>
>Between 0 and 1, the entire list of symbols is assigned a range of values.
>These symbols represent the different chess pieces, SPACE, Castle enabled
>Rooks, and En passant Pawns, etc.  Reading from the board sequencially
>(A8 - H1) the SPACE or other symbol is sent to ecoding depending on what piece
>occupies the curent square.  The encoded message would always contain the 64
>symbols that indicate what occupies each square.
>
>If, for example, there are 20 (for convenience) symbols, each symbol would map
>to 5% or .05 of the entire range (0 to 1).  The assignment might start out as
>follows:
>
>  SPACE      .0  -  .049999...
>W King       .05 -  .099999...
>W Queen      .10 -  .149999...
>W Bishop     .15 -  .199999...
>W Knight     .20 -  .249999...
>W Rook       .25 -  .299999...
>...
>Black's Turn .90 -  .949999...
>End of Data  .95 -  .999999...  (.9999...  Is 'identical' to 1)
>
>Note that we can add 0's after the low range values and still maintain accurate
>precision.  Likewise for the upper range values, we can add 9's on to the end
>of those numbers and get even more precise results.   This will later allow us
>to add more precision, as more and more symbols are encoded.
>
>After encoding the first symbol (W Rook for example) the range is narrowed to
>whatever size W Rook's range is and the 20 symbols are remapped into the new
>range:
>
>W Rook's range was .25 to .299999..., now all the symbols must fit within that
>range and are remapped giving each 5% of the new range:
>
>  SPACE      .2500  -  .252499...
>W King       .2525  -  .254999...
>W Queen      .2550  -  .257499...
>W Bishop     .2575  -  .259999...
>W Knight     .2600  -  .262499...
>
>If the next symbol is the W Knight, the new range would be from .26 to .262499.
>and all the symbols would be remapped to the new range.  If for example, this
>was the last (64th) symbol, the output is the lower value, IE: .26
>
>Decode:
>
>Starting again with the original symbol ranges, we can determine what the first
>symbol is by finding out what range the coded number falls into:  .26 falls
>between .25 and .29999... which is the range assigned to W Rook.  If we
>subtract out W Rook's lower range value, (.26 - .25 = .01) and divide by the
>range size (.05) we get a new number (.01 / .05 = .2) which is again matched
>against the symbols' ranges.  .2 belongs to W Knight, which was the second
>symbol encoded.
>
>Subtracting out W Knight's .2 leaves us with 0, so either a SPACE is the next
>symbol, or we're done with decoding.  Using an End Of Data symbol allows us to
>be certain where the end is.  Another method could rely on there being exactly
>64 symbols....
>
>Comments:
>
>If, after encoding all 64 symbols, the resulting number could be accurately
>represented in a (VB) Double type, the actual bit usage could be reduced to
>the 64 bits needed to store the Double type.  In acuality, the 64 bits would
>be read as one 64 bit binary fraction, where the decimal point is assumed to
>be on the left side instead of the right, allowing the 64 bits to represent
>the encoded (fractional) value.
>
>The trouble with implementing this method is that the floating point math
>packages in computers today are not equipped to handle numbers with many
>significant digits.  Some means of working on lengthy numbers is needed and
>improvements may include using Byte arrays to store each range value.
>
>For the compression buffs, this method offers better entropy than Huffman
>coding in that for a symbol that is repeated often, its 'optimal' bit size
>would be somewhat less than one.  Huffman coding would probably assign one bit
>for the symbol, which could be 3 or 4 times more than the optimal value.  The
>Arithmetic coding method gets closer to achieving the optimal bit sizes and
>thus better entropy.   How you figure 'optimal' size and other such intracacies
>are beyond my experience, but you can read about them in 'The Data Compression
>Book' by Mark Nelson / M & T Books, (algorithms in ANSI C) from which the
>following code was adapted.
>
>Source Code:
>
>The demo below uses a form of arithmetic coding using floating point math on a
>Double Precision value.  In its current state, it is only good for about 14 or
>15 symbols, and sometimes less.
>
>On a new form, add a command button and 3 text boxes.  Using the letters A-S
>(T is the End of Data symbol) enter a string into Text1 and click on the
>button....
>
>Option Explicit
>Dim Model(0 To 20) As Double
>
>Private Sub Command1_Click()
>Dim Code As Double
>    Code = Encode(CStr(Text1) & "T") 'Appending EOD
>    Text2 = CStr(Code)
>    Text3 = Decode(Code)
>End Sub
>
>Private Sub Form_Load()
>Dim i&, k#
>    'Store Ranges for A-S (and EOD)
>    For i = 0 To 20
>        Model(i) = k
>        k = k + 0.05  ' = 1/20 of entire (20 symbol) table
>    Next i
>End Sub
>
>Private Function Encode(DataIn$) As Double
>Dim Hi As Double, Lo As Double, Range As Double
>Dim i As Byte, sym As Byte
>
>    Hi = 1#
>    Lo = 0
>    DataIn = UCase$(DataIn)
>    For i = 1 To Len(DataIn)
>        sym = Asc(Mid$(DataIn, i, 1)) - 64
>        'Trap input errors
>        If BadInput(sym, i) Then
>            Encode = 0.99999
>            Exit Function
>        End If
>        'Symbol OK - Reset new range values
>        Range = Hi - Lo
>        Hi = Lo + (Range * Model(sym))
>        Lo = Lo + (Range * Model(sym - 1))
>    Next i
>    Encode = Lo
>End Function
>
>Private Function Decode(Data As Double) As String
>Dim i As Byte, sym As Byte
>    Do
>        sym = FindSymbol(Data)
>        If sym = 20 Then Exit Do 'End Of Data symbol
>        'Subtract out the Lo value and divide by the range size
>        Data = (Data - Model(sym - 1)) / 0.05
>        Decode = Decode & Chr(sym + 64)
>    Loop
>End Function
>
>Private Function FindSymbol(ByVal X As Double) As Byte
>   'Where does X fall on the range of 0 - .9999 ...
>    FindSymbol = 1
>    Do While Model(FindSymbol) < X
>        FindSymbol = FindSymbol + 1
>    Loop
>End Function
>
>Private Function BadInput(ByVal s, ByVal i) As Boolean
>        If s > 20 Then
>            MsgBox "Input error"
>            'Highlight offensive input!
>            With Text1
>                .SelStart = i - 1
>                .SelLength = 1
>                .SetFocus
>            End With
>            BadInput = True
>        End If
>End Function



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.