Author: Ratko V Tomic
Date: 22:16:45 10/29/99
Go up one level in this thread
> For example, the castling state information requires 2 bits per side for > 4 additional bits. So, you could replace the 14 bits for kings and > castling status with 12 bits (there are less than 3900 legal white > king/black king pairings on the board, so the "can castle" info can > be placed in the remaining 200+ bits). You don't even need these extra 2 "can castle" bits. Namely, the castle can occur only if the king is on e1 (or e8 for black), while the extra bits allow full freedom of "can castle" for all king placements, i.e. they reserve code space for over ten thousands of bit combinations which can't ever occur (such as king on e2 and "white can castle queenside" etc). This combinatorics can be easily modelled by imagining that you have 3 extra squares on top of e1 (for k, q and k+q castle possibilities) like 3 upper floors (i.e. the ground floor is no-castle allowed). Since there are 3612 king pair non-castling (ground floor) positions (4*60+24*58+36*55), that leaves 484 unused combinations from the 4096 (12 bits available for the king pair). For non-ground floor combinations you need (ignoring the attack or the occupancy by opposing king of the squares between the castling king & potential rook; removing such illegal castle cases would only reduce further the packing space): 1) 3*58=174 for white non-ground (above e1, on floors 1,2,3) and black on the ground (on any of the 58 legal ground floor squares), then 2) 3*58=174 for black king non-ground, white ground, and finally 3) 3*3=9 for both kings above ground, which totals 357 combinations, leaving still 127 unused combinations from the 484 leftover in the 12 bits code. Hence the 2 kings plus the full castling status fit easily in 12 bits with room to spare (i.e. with about 0.05 bits leftover). Using 2+2 rooks (allowing also for a blank-rook state) in addition to the 2 kings, one can further reduce the size of the combined cluster of these 2-6 pieces with castling status (any castle-ok status also defines the content and the attack status on the squares in between the king and the rook, giving thus additional savings). This seems to be the pattern of the general flaw in the proposals I have seen so far in the thread, i.e. a great deal of code space is used for mutually exclusive combinations. In compression one first finds a suitable coding model which doesn't allocate code/symbol space for impossible states (or at least not very much such space), and only then, based on empirical statistics of these symbols one applies Huffman or arithmetic coding to the resulting description. Hence as the first phase one would need a parser or a finite state machine which remembers the entire sequence processed so far (which need not work as a simple single sequential scan of squares, but may perform multiple scans in any useful order), so that at no point it provides code space for impossible placement. For example, regarding kings & castle, if the parser has already traversed board and found white king on e1 and a white rook on h1, and no pieces in between and no black attacks on e1, f1 or g1, only then the parser outputs one of the symbols "wk-castle Ok" or "wk-castle not Ok", otherwise it outputs no castle related symbols (and the decoder doesn't expect any since it replicates parsing steps of the encoder). Similarly for ep pawn state, the parser identifies, say, a white and a black pawn side by side on the 4-th rank and no piece on the 2nd or the 3rd rank (behind white pawn), only then the parser outputs "ep-active" or "ep-not-active" symbol (if there are two black pawns around 1 white pawn, if the 1st black pawn had ep-not-active, the second one has the same automatically, hence no ep-status symbol is output for the 2nd black pawn). As soon as the first "ep-active" symbol occurs, no more ep symbols are being output (since if there is another black pawn on the 4-th rank on the other side of this ep-active white pawn, the 2nd black pawn has ep rights automatically, and since there can be no other ep-active symbols there is no need to output either any more). (Of course, the ep-status symbols need not be output until the entire board is already scanned and the piece placements are known to the encoder and the decoder; each then knows how many candidates, if any, are there and thus how many such symbols, at most, will follow; since the sequence terminates with the 1st ep-active symbol, that one need not be output at all but only a count of ep-not-active symbols in the range 0-Max_EP is produced). For promotions it would useful to give, at least, the counts of w/b promotions upfront (and possibly the type of promotion). That way one can know right away the max number of pawns for each side and one can keep track of when promotions are done and which pieces can still occur. Only after the non-redundant encoding scheme (a sort of mini language) is created one would go through the target sample of positions and count frequencies of various symbols and apply Huffman or arithmetic encoding (i.e. statistical encoding) of the symbol sequence. The Huffman or arithmetic code tables can also vary from square to square. Or if one is really ambitious, one may use dictionary methods as the second phase and finally the statistical encoding of the dictionary symbols. The dictionary methods would derive economy from the commonly found clusters of pieces and their typical placements.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.