Author: Ratko V Tomic
Date: 05:12:12 01/03/06
> There is a better upper bound see: > > http://chessprogramming.org/cccsearch/ccc.php?art_id=77068 > > Uri Hi Uri, that little excercise in enumeration you brought up (and mentioned in the other thread) set me off back then to try make it work as a general purpose compression algorithm. While a neat idea on paper, the problem was that the arithmetic precision had to be of the size of output. After struggling for a while, I searched the literture and it turned out such compression algorithm already existed, called "Enumerative Coding", since 1960s (first in Russian literature, from Kolmogorov and his disciples, then shortly thereafter here, in USA, from Lynch and Davisson). And, as in my version, the precision problem was still unsolved after over four decades of various attempts to make the algorithm practical. Since I arrived at it on my own, my conventions for enumeration happened to be backwards from those that existed in the literature (mine sorted right to left, the so-called colex sorting of combinations, and built up the enumerative addends bottom up, while the standard scheme sorted lexicographically worked top down). Further, due to my playing with lattice methods in QCD (in my physics graduate school days), I also had my own visual representation of combinatorics as lattice walks, which is very intuitive, heuristically rewarding way of looking at it, allowing one to see all of the combinatorial identities at a glance (that, too, turned out had existed in the EC literature, although not in as general or flexible formulation as mine and lacking the convenient notation for the lattice walks I worked out while doing physics). Since that thread, I kept returning to the problem, on and off, trying various ideas. Nothing worked. Then, in summer 2004, when my wife and kids went to a summer camp for a week, I stayed home to finish a programming project (a video codec). The first night home alone, at about 2AM, while debugging the latest batch of code, out of nowhere an idea popped into my head on that pesky enumeration problem, something I didn't yet try. I quickly coded just a toy version, allowing input buffers of 32 bits only, and by morning it worked -- a version using arithmetic precision of only 8 bits encoded & decoded correctly all 2^32 possible inputs. That same early Sunday morning, I woke up the company owner (I am a CTO & a chief scientist), and he, still half awake, yielded to my enthusism and agreed to suspend the original project, so I could try if the idea works on data of any size. At the time I didn't have proof, and wasn't even completely sure that it can always be decoded. I also didn't know what the maximum or average redundancy would result from the reduced precision. Within a week I had a simple version of code working on buffers up to 4 kilobits, using only 16 bit arithmetic precision (instead of 4 kilobit precision). It worked again, and it was very fast, even in that crude version. The redundancy due to the limited precision arithmetic was measured and it was on average about half a bit (and always within 1 bit from entropy) for the entire 4k block. The next couple months I extended the algorithm to any input size and to general alphabet (from the original binary alphabet). I also found a proof of general decodability and an expression for the max redundancy (due to finite precision). The max redundancy is always below log(e)/2^(g-1) for g bit arithmetic precision (I use now g=32 bit precision). The fourty years old puzzle was finally cracked. The accidental backwards conventions of my initial approach turned out to be the critical element exposing the key to the solution, which is virtually impossible to spot from within the conventional enumerative coding scheme. I also developed a new modeling scheme for combinatorial methods of coding (such as the new algorithm) which is quite promising on its own. It is basically a scheme along the lines of Kolmogorov's algorithmic approach to information theory (in contrast to Shannon's probabilistic approach, which is dominant at present, and where modeling for arithmetic coding consists in calculating the probabilities of the next single symbol). The algorithm, which I named "Quantized Indexing", turned out pretty amazing. It codes always tighter than the present best entropy coding algorithm, the Arithmetic Coding (which, it turns out is only a particular approximation of QI), yet it codes much faster than AC due to using only a simple table add (of a machine size word) for the less frequent symbol and no coding operations for the most frequent symbol (AC needs coding operations for both types of symbols, and more expensive operations at that, such as multiplies & divides). As result, QI runs typically 10-20 times faster than the fastest full Arithmetic Coder implementations (and always at least 6 times faster, which occurs in high entropy limit, while for very sparse data, the low entropy limit, QI runs well over 200 times faster). Recently, I posted a preprint about the algorithm to arXiv: http://arxiv.org/abs/cs.IT/0511057 and also created a web page with additional more detailed technical reports and the C source code: http://www.1stworks.com/ref/qi.htm A nice little surprise came when I emailed about the preprint to Jorma Rissanen, the inventor of arithmetic coding himself. He had struggled for several years with the very same Enumerative Coding precision problem, inventing in the process Arithmetic Coding as a compromise solution (in 1976, while working for IBM). Although busy with a lecture tour, he read the paper right away and was quite pleased to see his old problem solved at last and a bit surprised at how "very clever" the solution turned out to be. So, that's the chain of unlikely events triggered by your original code snippet for enumerating the chess positions.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.