Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Shortening Huffman coding + OT

Author: Ratko V Tomic

Date: 09:06:48 11/01/99

Go up one level in this thread


>>bits for the fully occupied board (without captures, promotions, castle or ep
>>status bits) is 82+64=146 bits...
>
>
>Different dispositions of pawns:   48! / (32!*8!*8!)
>
>Different dispositions of remaining pieces:  48! / (32!*(2!)^6)
>(Here 48 = 64-16 are the squares not occupied by pawns)
>
>Total dispositions:   (48!/(32!*8!))^2 / 64  = 2.14 10^40  ===> 134 bits
>

You are right, I had ignored all position legality constraints for to get
quickly an upper bound for the entropy of the full board. But, as your
calculation shows, just removing the primary illegal placements for the pawns
saves 12 bits for the full board, without perceptibly complicating the
calculations. The 134 bits is still an upper bound (since other illegal
configurartions, such as checks of the wrong side, positions without legal
predecessor, etc are ignored), but a much tighter one, making it more likely to
be able to fit all positions in 160 bits. Now with 160-134=26 spare bits, that
goal seems quite feasable.

> I hope to give an help to your arguments, but the problems arise with
> promotions and the worst case is when 12 promotions (if 12 is the max)
> are equally divided among White and Black and make almost equal the
> number of Q, R, N and B in each side (e.g. 4, 3, 3, 3).

Yes, that seems to be roughly the worst case, although I would switch the 4
piece count from Queen to Knight (i.e. from the highest mobility to the lowest
mobility piece) since that would produce fewer possibilities for an illegal
position due to illegal checks, thus it would result in more legal positions.
One would need a program to go through all possible material variations and add
the numbers of distinct permutations for them (since this doesn't seem like a
problem suitable for manual calculations; even without promotions, the captures
added 236195 new material balance cases to the full board one; the positions
with promotions might increase this tenfold or hundredfold).

As to the encoding strategy, suppose you have an algorithm to generate a list of
all legal positions, we'll denote the number of legal positions NLP. The number
of bits needed to index any position in the list is NB=[log2(NLP)]+ (where []+
denotes rounding up to the next higher integer, i.e. the ceiling() in C). This
is the optimal worst case encoding. Any other encoding, which uses for some
positions fewer bits than NB will automatically use for some other positions
more bits than NB, thus lenghtening the worst case code. (Although NLP & NB
refer to the ideal case of strictly legal positions, this conclusion about the
worst case sub-optimality for variable length codes hold even if you ignore some
types of illegality, as long as you ignore the same types of illegality in both
methods, the raw enumeration method and the variable length code method).

Therefore for this particular objective (seeking the shortest encoding for the
worst case position/s) one would have to avoid temptation of shortening
needlessly the codes for positions which don't exceed the 160 bits. Even more
counter-intuitively, whenever one finds a type of position which codes into a
fewer than 160 bits, one would purposefully look for ways to change the encoding
method to _increase_ the length of these short codes (of course, without adding
any superficial redundancy) to get as close as possible to the 160 bits. The
farther below 160 some type of position is, the more it needs lenghtening. This
goes against the grain, as it were, when one is in the "compressing" mind-set to
make onself think up of ways to lenghten the encoding.

For example, when we were looking the king pair+castle encoding and got it down
to 12 bits, we should have been looking instead for some non-redundant encoding
method which would stretch it into 160 bits. Now, that's an interesting goal.

The more evenly the average codes approach 160 bits from below, the better
opportunities will emerge for the worst case positions to fit in 160 bits. Since
we don't know actually whether the 160 is the right target (it may be too low or
too high), the more general strategy might be to try to vary the encoding
methods aiming to minimize the difference between the highest and the lowest
cases known at a given stage.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.