Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Arithmetic coding of chess piece positions

Author: Ratko V Tomic

Date: 08:41:30 11/15/99

Go up one level in this thread


> this method offers better entropy than Huffman coding in that for a
> symbol that is repeated often, its 'optimal' bit size would be somewhat
> less than one.  Huffman coding would probably assign one bit for the
> symbol, which could be 3 or 4 times more than the optimal value.

The bit roundoff effect which makes Huffman code somewhat less efficient than
arithmetic, is relatively small compared to other variables that affect the
compression. Moreover, even within the Huffman codes, the roundoff effect can be
easily overcome by redefining what symbols are. For example by blocking a
sequence of original data symbols into a sigle new symbol, and doing that for
many different sequences (enlarging thus the alphabet, i.e. the "symbol" need
not be a byte or even any particular number of data stream bits, but it could be
a relations among any number of data bits) until no single symbol in the
enlarged alphabet has a very large probability, you produce a symbol set where
each symbol may need tens of bits, making thus the bit fraction problem
practically insignificant.

Additionally, the bit fractions that arithmetic codes gain are often completely
meaningless noise, being the artifacts of fluctuations in a finite sample size
which was used to build up the probability tables. The problem with trying to
enlarge sample sizes (which might fix these types of statistical errors) is that
they lose local context sensitivity, so that the probability tables won't adapt
quickly to the suddenly changed data.

The power of arithmetic codes is utilized well only if they are employed in
conjuction with a good enough model, as specific as possible to the given type
of data, which can generate truly accurate and locally sensitive probability
tables, which can change entirely after each encoded symbol (even here, though,
the blocking method suggested above removes any practically perceptible gains of
the arithmetic coder over the Huffman coder).

Chess positions are particularly well suited to this kind of domain specific
modelling. In normal game positions, or even the problem positions, there are
many patterns (from the simple geometric ones like fianchetto, rooks on open
lines, etc to the more complex ones which define things like phase of the game)
which skew significantly the probabilities for the next symbol (which need not
be a piece symbol on the next square, but it could be a cluster of pieces over
several non-contiguous squares, or some general inter-piece relation etc). The
dictionary methods (they're are generally very fast) which collect and recognize
such patterns would work well as a modelling stage feeding their dictionary
codes for patterns & their probabilities to the the arithmetic or Huffman coder.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.