Author: Vincent Diepeveen
Date: 08:21:44 11/15/99
Go up one level in this thread
Cool Arithmetic coding, by far the best way of compressing things, but very CPU intensive. What is so far the worst case in a chessposition you encountered, how many bits do you need for a position at maximum? Greetings, Vincent On November 15, 1999 at 10:13:17, Larry Serflaten wrote: >At the invitation of "Les", I will attempt to describe the Arithmetic coding >technique as it pertains to storing the chess piece positions on a chessboard. >Further down is source code in Visual Basic for those who may want to study >the implementation (so far). > >Encoding: > >This method uses numbers in the range of 0 to 1, so that, any output from the >encoding process will always be a decimal (fractional) value. Where that value >falls in the range of 0 to 1, has an infinate number of possibilities, because >we can always add more numbers to the end of the decimal value, (for more >encoded symbols) and still reside between 0 and 1. > >Between 0 and 1, the entire list of symbols is assigned a range of values. >These symbols represent the different chess pieces, SPACE, Castle enabled >Rooks, and En passant Pawns, etc. Reading from the board sequencially >(A8 - H1) the SPACE or other symbol is sent to ecoding depending on what piece >occupies the curent square. The encoded message would always contain the 64 >symbols that indicate what occupies each square. > >If, for example, there are 20 (for convenience) symbols, each symbol would map >to 5% or .05 of the entire range (0 to 1). The assignment might start out as >follows: > > SPACE .0 - .049999... >W King .05 - .099999... >W Queen .10 - .149999... >W Bishop .15 - .199999... >W Knight .20 - .249999... >W Rook .25 - .299999... >... >Black's Turn .90 - .949999... >End of Data .95 - .999999... (.9999... Is 'identical' to 1) > >Note that we can add 0's after the low range values and still maintain accurate >precision. Likewise for the upper range values, we can add 9's on to the end >of those numbers and get even more precise results. This will later allow us >to add more precision, as more and more symbols are encoded. > >After encoding the first symbol (W Rook for example) the range is narrowed to >whatever size W Rook's range is and the 20 symbols are remapped into the new >range: > >W Rook's range was .25 to .299999..., now all the symbols must fit within that >range and are remapped giving each 5% of the new range: > > SPACE .2500 - .252499... >W King .2525 - .254999... >W Queen .2550 - .257499... >W Bishop .2575 - .259999... >W Knight .2600 - .262499... > >If the next symbol is the W Knight, the new range would be from .26 to .262499. >and all the symbols would be remapped to the new range. If for example, this >was the last (64th) symbol, the output is the lower value, IE: .26 > >Decode: > >Starting again with the original symbol ranges, we can determine what the first >symbol is by finding out what range the coded number falls into: .26 falls >between .25 and .29999... which is the range assigned to W Rook. If we >subtract out W Rook's lower range value, (.26 - .25 = .01) and divide by the >range size (.05) we get a new number (.01 / .05 = .2) which is again matched >against the symbols' ranges. .2 belongs to W Knight, which was the second >symbol encoded. > >Subtracting out W Knight's .2 leaves us with 0, so either a SPACE is the next >symbol, or we're done with decoding. Using an End Of Data symbol allows us to >be certain where the end is. Another method could rely on there being exactly >64 symbols.... > >Comments: > >If, after encoding all 64 symbols, the resulting number could be accurately >represented in a (VB) Double type, the actual bit usage could be reduced to >the 64 bits needed to store the Double type. In acuality, the 64 bits would >be read as one 64 bit binary fraction, where the decimal point is assumed to >be on the left side instead of the right, allowing the 64 bits to represent >the encoded (fractional) value. > >The trouble with implementing this method is that the floating point math >packages in computers today are not equipped to handle numbers with many >significant digits. Some means of working on lengthy numbers is needed and >improvements may include using Byte arrays to store each range value. > >For the compression buffs, this method offers better entropy than Huffman >coding in that for a symbol that is repeated often, its 'optimal' bit size >would be somewhat less than one. Huffman coding would probably assign one bit >for the symbol, which could be 3 or 4 times more than the optimal value. The >Arithmetic coding method gets closer to achieving the optimal bit sizes and >thus better entropy. How you figure 'optimal' size and other such intracacies >are beyond my experience, but you can read about them in 'The Data Compression >Book' by Mark Nelson / M & T Books, (algorithms in ANSI C) from which the >following code was adapted. > >Source Code: > >The demo below uses a form of arithmetic coding using floating point math on a >Double Precision value. In its current state, it is only good for about 14 or >15 symbols, and sometimes less. > >On a new form, add a command button and 3 text boxes. Using the letters A-S >(T is the End of Data symbol) enter a string into Text1 and click on the >button.... > >Option Explicit >Dim Model(0 To 20) As Double > >Private Sub Command1_Click() >Dim Code As Double > Code = Encode(CStr(Text1) & "T") 'Appending EOD > Text2 = CStr(Code) > Text3 = Decode(Code) >End Sub > >Private Sub Form_Load() >Dim i&, k# > 'Store Ranges for A-S (and EOD) > For i = 0 To 20 > Model(i) = k > k = k + 0.05 ' = 1/20 of entire (20 symbol) table > Next i >End Sub > >Private Function Encode(DataIn$) As Double >Dim Hi As Double, Lo As Double, Range As Double >Dim i As Byte, sym As Byte > > Hi = 1# > Lo = 0 > DataIn = UCase$(DataIn) > For i = 1 To Len(DataIn) > sym = Asc(Mid$(DataIn, i, 1)) - 64 > 'Trap input errors > If BadInput(sym, i) Then > Encode = 0.99999 > Exit Function > End If > 'Symbol OK - Reset new range values > Range = Hi - Lo > Hi = Lo + (Range * Model(sym)) > Lo = Lo + (Range * Model(sym - 1)) > Next i > Encode = Lo >End Function > >Private Function Decode(Data As Double) As String >Dim i As Byte, sym As Byte > Do > sym = FindSymbol(Data) > If sym = 20 Then Exit Do 'End Of Data symbol > 'Subtract out the Lo value and divide by the range size > Data = (Data - Model(sym - 1)) / 0.05 > Decode = Decode & Chr(sym + 64) > Loop >End Function > >Private Function FindSymbol(ByVal X As Double) As Byte > 'Where does X fall on the range of 0 - .9999 ... > FindSymbol = 1 > Do While Model(FindSymbol) < X > FindSymbol = FindSymbol + 1 > Loop >End Function > >Private Function BadInput(ByVal s, ByVal i) As Boolean > If s > 20 Then > MsgBox "Input error" > 'Highlight offensive input! > With Text1 > .SelStart = i - 1 > .SelLength = 1 > .SetFocus > End With > BadInput = True > End If >End Function
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.