Computer Chess Club Archives


Search

Terms

Messages

Subject: Number of positions in chess -- the rest of the story

Author: Ratko V Tomic

Date: 05:12:12 01/03/06


> There is a better upper bound see:
>
> http://chessprogramming.org/cccsearch/ccc.php?art_id=77068
>
> Uri

Hi Uri, that little excercise in enumeration you brought up
(and mentioned in the other thread) set me off back then to
try make it work as a general purpose compression algorithm.
While a neat idea on paper, the problem was that the arithmetic
precision had to be of the size of output.

After struggling for a while, I searched the literture and
it turned out such compression algorithm already existed,
called "Enumerative Coding", since 1960s (first in Russian
literature, from Kolmogorov and his disciples, then shortly
thereafter here, in USA, from Lynch and Davisson). And, as
in my version, the precision problem was still unsolved after
over four decades of various attempts to make the algorithm
practical.

Since I arrived at it on my own, my conventions for
enumeration happened to be backwards from those that
existed in the literature (mine sorted right to left,
the so-called colex sorting of combinations, and built
up the enumerative addends bottom up, while the standard
scheme sorted lexicographically worked top down). Further,
due to my playing with lattice methods in QCD (in my
physics graduate school days), I also had my own visual
representation of combinatorics as lattice walks, which
is very intuitive, heuristically rewarding way of looking
at it, allowing one to see all of the combinatorial
identities at a glance (that, too, turned out had existed
in the EC literature, although not in as general or flexible
formulation as mine and lacking the convenient notation
for the lattice walks I worked out while doing physics).

Since that thread, I kept returning to the problem, on and
off, trying various ideas. Nothing worked. Then, in summer
2004, when my wife and kids went to a summer camp for a week,
I stayed home to finish a programming project (a video codec).
The first night home alone, at about 2AM, while debugging
the latest batch of code, out of nowhere an idea popped into
my head on that pesky enumeration problem, something I didn't
yet try. I quickly coded just a toy version, allowing input
buffers of 32 bits only, and by morning it worked -- a version
using arithmetic precision of only 8 bits encoded & decoded
correctly all 2^32 possible inputs.

That same early Sunday morning, I woke up the company owner
(I am a CTO & a chief scientist), and he, still half awake,
yielded to my enthusism and agreed to suspend the original
project, so I could try if the idea works on data of any size.
At the time I didn't have proof, and wasn't even completely sure
that it can always be decoded. I also didn't know what the
maximum or average redundancy would result from the reduced
precision. Within a week I had a simple version of code working
on buffers up to 4 kilobits, using only 16 bit arithmetic precision
(instead of 4 kilobit precision). It worked again, and it was
very fast, even in that crude version. The redundancy due to
the limited precision arithmetic was measured and it was on
average about half a bit (and always within 1 bit from entropy)
for the entire 4k block.

The next couple months I extended the algorithm to any input
size and to general alphabet (from the original binary alphabet).
I also found a proof of general decodability and an expression
for the max redundancy (due to finite precision). The max
redundancy is always below log(e)/2^(g-1) for g bit arithmetic
precision (I use now g=32 bit precision). The fourty years old
puzzle was finally cracked. The accidental backwards conventions
of my initial approach turned out to be the critical element
exposing the key to the solution, which is virtually impossible
to spot from within the conventional enumerative coding scheme.

I also developed a new modeling scheme for combinatorial methods
of coding (such as the new algorithm) which is quite promising
on its own. It is basically a scheme along the lines of Kolmogorov's
algorithmic approach to information theory (in contrast to Shannon's
probabilistic approach, which is dominant at present, and where
modeling for arithmetic coding consists in calculating the
probabilities of the next single symbol).

The algorithm, which I named "Quantized Indexing", turned out
pretty amazing. It codes always tighter than the present best
entropy coding algorithm, the Arithmetic Coding (which, it
turns out is only a particular approximation of QI), yet it
codes much faster than AC due to using only a simple table
add (of a machine size word) for the less frequent symbol
and no coding operations for the most frequent symbol (AC
needs coding operations for both types of symbols, and more
expensive operations at that, such as multiplies & divides).

As result, QI runs typically 10-20 times faster than the
fastest full Arithmetic Coder implementations (and always at
least 6 times faster, which occurs in high entropy limit,
while for very sparse data, the low entropy limit, QI runs
well over 200 times faster).

Recently, I posted a preprint about the algorithm to arXiv:

   http://arxiv.org/abs/cs.IT/0511057

and also created a web page with additional more detailed
technical reports and the C source code:

   http://www.1stworks.com/ref/qi.htm

A nice little surprise came when I emailed about the
preprint to Jorma Rissanen, the inventor of arithmetic coding
himself. He had struggled for several years with the very
same Enumerative Coding precision problem, inventing in the
process Arithmetic Coding as a compromise solution (in 1976,
while working for IBM). Although busy with a lecture tour,
he read the paper right away and was quite pleased to see
his old problem solved at last and a bit surprised at how
"very clever" the solution turned out to be.

So, that's the chain of unlikely events triggered by your
original code snippet for enumerating the chess positions.




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.