Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Shortening Huffman coding + OT

Author: KarinsDad

Date: 10:27:09 11/01/99

Go up one level in this thread


On November 01, 1999 at 00:07:07, Ratko V Tomic wrote:

>>I am not discussing the best way to compress a series of positions into a
>>database or any other such mechanism.
>
>I see now, you need the shortest worst case position. You goal of 160 bits is
>just about as big as the entropy of the all chess positions. Namely, if, to
>start with, you take 32 regular pieces (without promotions and captures), you
>could first encode in 64 bits the occupied/empty square bitmap, then for the 32
>occupied squares you can look at all the permutations of the 32 symbol string
>KQRRBBNNPP...kqrr...pp which gives you NP (number of distinct permutations) =
>32! / (swap symmetry), where the swap symmetry removes double counting whenever
>identical pieces are exchanged among themselves, and which contains factors
>(2!)^6=64 for {R, B, N, r, b, n} and (8!)^2=1.6*10^9 (for pawns). So the total
>symmetry divisor is about 1.04*10^11, while 32!=2.6*10^35, which gives total
>distinct permutations of NP=2.52*10^24, which fits in 81.6 bits. So the total of
>bits for the fully occupied board (without captures, promotions, castle or ep
>status bits) is 82+64=146 bits, leaving 16 spare bits to cover other cases (i.e.
>the remaining cases are allowed to contain as many positions as 65535 times the
>number of full board positions).
>
>For example, if we now include caseses for all possible captures (but no
>prompotions), in addition to the NP(32) permutations, for each piece (other than
>kings) you can have counts of: Q,q=(0,1), R,N,B,r,n,b=(0,1,2) and for
>P,p=(0,1,...,8) which, when all multiplied together, gives
>M=(2^2)*(3^6)*(9^2)=236196=2.4*10^5=17.8 bits, which is slightly more (by 1.8
>bits) than the 16 spare bits we had, but this is only a very rough upper bound
>on how many positions captures would add, since none of these "captures-present"
>positions will need anywhere near the full NP (82) bits to represent (e.g. 2
>lone kings need only 12 bits instead of 82; or if the half light pieces, rooks &
>pawns are taken out, so you have distinct permutations of 18 pieces:
>KQRBNPPPPkqrbnpppp, you get NP=18!/(4!)^2=1.1*10^13=43.3 bits instead of the 82
>bits). Adding all such NP's wouldn't add 17.8 bits to the 146 but perhaps only
>7-10 bits.
>
>For the full number of distinct positions one would first need to add all NP's
>for the 236196 combinations of material content (a small C program would
>probably do it in few seconds; it would take much longer to write it, though).
>Then, one would add all NP's for all combinations of promotions. As for the
>castle and ep status, only a tiny fraction of positions would qualify as
>candidates for non-default status (default being: no ep and no castle rights),
>so, using the same method of topping off such special cases as described in the
>example of 2 kings + castle status in 12 bits, for the purposes of entropy
>estimates (say, to the nearest bit) these statuses can likely be entirely
>ignored.
>
>In any case, it seems that 160 bits seems to be pretty close (if not perhaps too
>low) to the number of bits in the number of all legal positions (knowing such
>number still doesn't give you a practically usable recipe for encoding).

160 is quite possibly too low and unattainable. I am currently at 161, however,
one portion of my algorithm takes into account a wide variety of possible pawn
column possibilities based on how many pawns and pieces are on the board. This
is extremely complex, so I am not positive that I have it correct. If I do, 161
is attainable. If I do not, the "current" answer is probably closer to 163.

>
>>>Therefore, in a multipass
>>>algorithm, where at any point the algorithm remembers (and can make any
>>>suitable use of) _all_ the info processed so far, one can not only make
>>>assumptions, but know exactly, the attack status of any square, as long
>>>as the earlier pass (or generally, the decoding process so far) has
>>>produced the sufficient piece placement info to generate the needed
>>>legal moves.
>>
>>When you are talking a database of associated positions, these type of
>>techniques work fine. But, for a random legal chess position, the legal move
>>generator will not work since it takes up too many bits. A multipass
>> decoder has a chance of working, however, I have yet to come up with a
>>reason to do it in this manner.
>
>If you use a statistical encoder which can effectively use fractions of a bit,
>such as arithmetic coder, and where the number of bits produced is -log2(p),
>where p is the probability of the next symbol in the stream, then a multipass
>encoder which already knows king positions & castling rights (from earlier
>pass), will be able to remove some impossible pieces from some squares (due to
>illegal checks or castle violations), increasing thus probabilities for the
>remaining possibilities, and thus shortening the output (arithmetic coder can
>take advantage of a small fraction of one bit). How much that would affect the
>worst case, I don't know, but probably not very much (possibly less than a bit;
>OTOH if you've got the worst case down to 160.3 bits without legal move
>generator and, say, it could give you 0.4 bits in the worst case, it may be
>worth it).
>
>Another possible use of the legal move generator is to construct a sparser space
>of positions, like stepping stones, which may be in some way simpler to encode,
>and cover the enumeration space in between (i.e. the remaining positions) as all
>positions 1 ply away from the nearest stepping stone position. The reason these
>might save space is that one can pick for the stepping stone positions those
>which have some additional regularity (not present in a general position), such
>as which piece must come before some other piece during a particular order of
>scanning of the board, or how many empty squares have to exist (at least) in
>between some pieces (i.e. we would pick among, say, the 1-ply (back or forth)
>set of positions the one which satisfies best some such regularity). Obviously
>to actually take advantage of any such regularities, one would need a
>statistical coder, so that any knowledge of such regularity would affect
>probabilities (by decreasing the uncertainty) of the content of the next square
>being encoded.
>
>As to the more general multipass algorithm (not just legal move generator), I
>don't have the slightest doubt it could be put in a good use, provided you also
>use the arithmetic (instead of Huffman) encoder in the final pass, to take the
>full advantage of the increased certainty of guessing the next piece (thus
>producing the shorter arithmetic codes, which can use less than 1 bit if the
>probability of next symbol is > 50%) based on the info gathered in the previous
>passes.
>
>
>>>The same type of saving would beobtained by combining rooks with kings.
>>
>>No, this is incorrect. The reason is simple enough. Since the 2 kings take up 2
>>squares, that leaves approximately 62 squares for the a1 rook, 61 squares for
>>the h1 rook, 60 squares for the a8 rook, and 59 squares for the h8 rook.
>>Dropping illegal positions for the moment, that results in 13388280 or just
>>under 24 bits for the rooks or 6 bits per rook.
>>
>>Let's do the real math:
>>...
>
>Well, first, as you noted in your correction, the swap between two rooks of the
>same color gives you one bit. You also need to note that the swaps for 2 pairs
>of rooks saves 2 bits, hence this method produces under 34 bits

Yes, approximately 33.1 bits. I realized this after I posted my correction, but
did not bother to get back on to correct it once again.

 (instead of
>under 36 or under 35 as stated earlier). Your other method (#1) could supposedly
>get the "same" thing in 29.3 bits. The problem is that it is not the "same"
>thing. Namely, in the methods considered, the two basic representations were
>used:
>
>A) Square Value method (SV) where squares are scanned (in some predefined order)
>and their content is specified (e.g. empty or some piece & code), and

Another type of SV method that I tried (but it failed) is to change the
direction of scan for the location bitboard (I used 4 directions: 0 degrees, 45
degrees, 90 degrees, and 135 degrees). I then used a simple compression choice
of 00 is represented by 0 (10 is 10, 11 is either 01 or 11 and it is determined
later); or 01 is represented by 0, etc. The problem is that it takes up TOO many
bits, just to indicate direction of scan and type of compression. I was hoping
that it would help out in the heavy promotion cases since those would have an
abundance of zeros in them.

For example, with the kings handled by the 12 bits, and all possible promotions
and no pawns, that would leave 62 squares and 26 pieces, or 36 empty squares.
Since this is 58% white space, it seemed a good possibility for bitboard
compression of same nature.

Another compression idea of the bitboard with an SV method is to sort the
squares by every nth square where n is a prime number (going back over the board
each time). You could use 5 bits to indicate the number of pieces to look for
and 3 bits to indicate which prime to use (1, 2, 3, 5, 7, 11, 13, or 17). To
enhance this even more, a start position set of bits could be added, but I am
uncertain whether a 6 bit actual location or some form of exponential relative
positioning indicator (fewer bits) would work better.

The encoder finds out which prime works best (i.e. gets the largest number of
piece locations the quickest). If you could use this to "jump" over the blank
squares and then use a PA method to indicate the remaining pieces, once it is
determined which pieces are missing. The other advantage of such a scheme is
that since you have probably removed more than 32 squares in the first pass, the
PA would start with a 5- bit pointer instead of with a 6- bit pointer and
hopefully, this would then only be needed for 2 to 5 pieces anyway.

This technique would have to save more than 8 bits of white space in the
bitboard in order to be useful.

However, I haven't explored this idea yet due to the complexities of creating a
rigorous proof.

>
>B) Piece Address method (PA), where the available pieces are scanned (using some
>ordering convention) and for each piece its address on the board is given (or
>its equivalent, e.g. in the case of irregular enumerations, such as king pair
>with castle rights, the addresses were given slightly indirectly for the
>can-castle cases).

The other disadvantage of the PA method is that you also have to determine pawn
promotions.

>
>The two "better" methods were of SV type, while the "worse" method was of PA
>type. The main problem is that you cannot count how much space SV type coding
>will use by merely giving the number of bits used on the squares occupied by the
>pieces of interest. The obvious counter-example is a case of just 2 kings and
>2+2 rooks. For the "better" methods, the SV ones, you have counted only the bits
>used by the codes for kings and rooks on the squares they occupy. But to
>recontruct the position you still need the encode lengths of 7 segments of empty
>squares (which can range between 0 to 58 squares and whose sum is 58; e.g. the
>58 sequential empty squares occur if you have KRRrrk or KRrRrk or RKRrkr or
>rRkRrk, ... etc type of sequences in the top left or the bottom right corner).
>
>On the other hand, the "bad" method (PA type) already has in its 34 bits
>everything it needs to reconstruct exactly any K,R,R,k,r,r position. By the time
>you add encoding for the empty squares to the two "good" SV methods (so that
>from the transmitted code you can reconstruct the actual exact position), you
>will add at least 7 bits (very likely twice as many), which will take you from
>the 29 to at least 36 bits (with most generous allowances for encoding of the 7
>empty square runs), which becomes worse than the 34 bits upper bound for the
>"bad" method.

This is not quite complete. You have to indicate with a PA method that the white
queen knight is NOT on the board, that the white queen bishop is not on the
board, etc., etc., etc. This takes up a lot of bits (minimum of 30 with some
form of does piece exist bitboard). The total bits would be 1 side to move + 34
kings and rooks + 30 piece exist = 65 bits.

The SV method would take 1 side to move + 62 bit bitboard (max with no
compression) + 12 kings/castling status + 1 which SV algorithm + 3.33 per rook *
4 rooks = 90 bits.

As you add each piece, the PA method adds somewhere between 5 and 6 bits (closer
to 6 bits earlier on, closer to 5 bits as almost all pieces are on the board).
It also adds some bits for pawn promotion. The SV method adds 3.33 bits per
piece (or 3 bits for pieces on the back ranks).

It takes about 11 additional pieces (excluding the 4 rooks) for the SV method to
overtake the PA method worse case.

>
>Generally speaking, the PA methods work better for sparser positions (king pair
>in 12 bits is an obvious example), while SV works better for more filled
>positions. I don't know where exactly the boundary between the "sparse" and the
>"filled" is for this problem, but the 6 piece case, 2 kings and 2+2 rooks
>certainly falls within the "sparse" enough to make the PA method better than SV
>method.

Hmmm. The general mechanism for decreasing the overall minimum number of bits
for the maximum bit intensive positions is by creating a "compression rule" that
works well for the bit intensive positions, but does not work well for the bit
non-intensive positions. An example of this is the EP compression rule
(unfortunately, that one only gives you EP state for free without gaining any
sort of bit decrease for positions). The reason for this is that you do not mind
increasing bit non-intensive positions from 120 bits to 140 bits (or even 160
bits), if it can decrease the bit intensive position by as much as one bit.

I wonder if there is some mechanism to increase the number of bits of the PA
method in order to achieve a decrease in the number of bits of the SV methods. I
doubt there is since the two methods are so drastically different, but I will
think on it.

>
>>Hence, EP does not require 4 more bits (as in your type of representation).
>>
>>Note: I realize that your representation does not need the 4 extra bits
>>UNLESS there is an EP possibility. However, the worse case scenario
>>(i.e. the largest representation) will be an EP case, hence, the 4
>>extra bits will be required for SOME positions.
>
>Yes, but I think that the critical positions (the candidates for the worst case)
>are those where the number of promotions is maximized (since these will use more
>of longer codes needed for pieces instead o shorter codes used for pawns), hence
>they will be the positions which will have no or very few pawns and will have
>the cost of EP encoding close to 0 bits (or, say, with 1 black and 1 white pawn
>you need at most 1 bit for the ep status, with 3+3 pawns you need at most 2
>bits, with 7+7 at most 3 bits, 8+8 at most 4 bits). My worst case EP encoding,
>the 4 bits, occurs only if all pawns are present (with 0-8 ep capture
>candidates, mixed between colors), hence in that case no promotions have
>occured, and these cases are not critical to the "worst case in 160 bits"
>objective. (Obviously I assume a multipass algorithm to take advantage of
>knowing the potential ep capture targets, thus knowing what size the ep status,
>which may follow in the later pass if necessary, will have.)

What you say is true, but you forget one thing. Why add even 1 bit for EP when
you can get away with adding 0 bits by representing the exact some position
(with or without EP) in a different manner? The only reason to use your method
would be if you came up with some form of SV algorithm that only works if you do
NOT use my EP method and that SV algorithm saves 2 to 4 bits in the heavy
promotion cases.

>
>
>>This is very interesting, even if it is a mechanism to handle a totally
>>different goal. I like it. I will have to look at it in more detail some day.
>
>Unfortunately, it is completely unsuitable for minimizing the length of the
>worst case positions. It is meant for minimizing the average encoding length
>(i.e. a total size in some small subset of all legal positions). It achieves
>that by, in effect, stretching the worst case to a much longer one. So it is
>outright contrary to your objective.

I am also writing a chess program and I like unusual concepts such as this. For
example, if I could use certain aspects of this such as the bishop fianchetto
recognition to create the upper portion of a hash index, I may be able to even
out more the graph nodes in the hash table. Food for thought.

KarinsDad :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.