# Computer Chess Club Archives

## Messages

### Subject: Counting & Encoding Any Chess Position in 157 bits

Author: Ratko V Tomic

Date: 23:53:57 11/09/99

```In an earlier thread on applying Huffman codes for enumerating and encoding
chess positions, one sub-thread dealt with the issue of what kind of constraints
on number of promotions, expressed in terms of captures for a given Material
Content (MC, i.e. the counts of pieces by piece type) state, can be given. Any
such constraint helps reduce the number of valid MC states, thus reduces the
upper bound on the number of legal positions (hence on the code length needed to
encode an arbitrary position). In that context (recalling his earlier program
enumerating chess positions) Blass Uri wrote:

> My program used this fact [that the #of promotions<= #missing pawns for
> each side] and the fact that:
>
> an upper bound for the number of promoted white pieces is
> 2*the number of captures of black pieces
> +the number of captures of white pieces(I include pawns also as pieces).

As illustrated in the earlier reply to the above, some of the optimal
capture+promotion sequencies (max promotions for min captures) do not reach the
bound above, which means (since these optimal transactions cannot be exceeded by
any other transactions) that a tighter upper bound exists. The tightest upper
bound of this type (on promotions in terms of captured material, its kind and
its quantity) is:

Number of white promotions cannot exceed

2 * (# of black pawns captured)
+ # of black non-pawns captured
+ # of white pieces captured (pawns or non-pawns),

or symbolically:

wp <= 2*bpc + bnc + wc   (1)

(and similar for the reverse colors, via prefix swap w<->b)

Note that in your equation the 2nd term (bnc) has also a factor of 2, making
your upper bound higher whenever there are black non-pawns captured. This bound
is the tightest possible (in these terms) since it is achieved exactly (turned
into equality) by the optimal capture+promotion sequences listed earlier, hence
a lowering of the upper bound would necessarily make it false. Using this bound
(or rather, a derived one in terms of excess and defect in material for a given
MC state, classified by pawns and non-pawns, for white and black), the
enumeration algorithm worked as follows:

1. Compute all Material Content (MC) states for white. The MC state
is defined by counts of all piece types (Q,R,B,N,P).

1.1 There are 8694 distinct MC states for white (the loop actually
iterates exactly this many times)

1.2 The only constraint on the one side MC states is that:

# of excess pieces + # of pawns <= 8

where the "excess pieces" are non-pawns which are not present
in the initial chess positions (i.e. W.Queens above 1, W.Rooks
above 2, etc). This is simple consequence of the fact that
each promotion removes exactly one pawn. Since white can have
(with cooperating opponent) all 8 pawns promoted without having
any of its pieces captured, the above is the strongest constraint
on promotions in terms of piece counts for one side (black or white).

2. For each white MC state, pair it with up to 8694 MC states for
the black, resulting in at most 8694^2=75,585,636 potential pairs
(i.e. the full MC states).

2.1 If the pairing of MC state x with MC state y was computed
then it is not necessary to compute pair y,x (reversed color),
so the pairing loop makes only 8694*8695/2=37,797,165 iterations.

2.2 The new promotion/capture constraint (1) is applied to the pairs,
resulting in rejection of 17,501,326 pair combinations, which
violated the inequality, leaving 58,084,310 complete (black+white)
valid MC states. Hence the MC state can be encoded in 25.792 bits,
i.e. in using a 26 bit whole bit codes.

3. For each valid complete MC state the number of Material
Placements (MP) is computed, and these were added over all the
MC states (58,084,310 such states).

3.1 The number of distinct chess positions was thus obtained as:

N = 2.463947 * 10^46 = 154.110 bits

This is lower than your earlier constraint 3.701*10^46, as one
would expect due to the tighter promotion constraint.

3.2 The maximum placements were generated by the 30 piece symmetrical
MC state given as: W.QRNBP=3,3,2,2,4 B.QRNBP=3,3,2,2,4 and it had
the #MP=3.067753*10^42=141.138 bits (the same value is obtained,
of course, by all other states differing from this one only in
the independent permutations in W & B non-pawn counts).

3.3 The number of MC states which used (for their #MP) 130 bits or
more was counted and found to be 1,016,088 (smaller than
1024^2, i.e. smaller than 20 bits).

3.4 The Bishop optimization (discussed in an earlier sub-thread here)
was not performed in the current version (due to the estimated
total gain of at most few millibits from the 154.110 bits, while
complicating the code great deal compared to the single bishop
type enumeration).

3.5 The pawn optimization (48 available squares instead of 64
squares) was used.

3.6 Castle state was not included in the count, but it was estimated
that it would add less than 0.13 bits to the position code, in
the worst case (check earlier discussion on king pair + castle
status encoding in 12 bits). The ep status also wasn't encoded,
but since there could be at most 4 ep capture targets (white
assumed to move, thus 4 targets are black pawns on the 5-th rank),
the ep would add at most log2(5)=2.32 bits (to the 154.110 bit
position code length; some checks on pawn counts in MC state could
lower this further). Thus the ep+castle status would still keep
the full position code length below 157 bits.

4. An encoding method which achieves uniform (across all positions)
whole-bit code length of 155 bits (or 157 bits when ep & castle
are included) consists of encoding any position with a 2 part
code, MC part (specifying one of the MC states among the 58,084,310
valid MC states) plus an MP part indexing a particular placement
permutation among all the placements generated by this MC state.

The specific steps needed to achieve the uniform length 155-bit
code in this manner are as follows:

4.1 Replace the fixed length 26 bit MC code with a variable length
Huffman code, which uses the Number of MP states (for this MC)
divided by the total number of states (N=2.46*10^46) as the
Huffman weight (probability) for this MC state.

4.2 If, for a given MC state, the #of MPs uses X bits, then the
MC state gets encoded in L(MC)=(155-X) bits. This follows from
the Huffman code length L(MC) = -log2(p) = -log2[(2^X)/N)] =
log2(N)-X = 155-X. The full code length is L(MC)+L(MP) =
= (155-X)+X = 155, i.e. it is a fixed length 155-bit code.

4.3 The encoder/decoder of the MC part of the code would use a
sorted array of one side MC states (8694 entries). The full
MC code (for both sides) would be constructed on the fly
as MC pairs, and their validity would be checked as in the
section 2, 2.1, 2.2. If the optimization for the bishops were
added, the one side MC array would have 26,298 entries

4.4 The MP part of the code (the placement index for a given MC
state) can be encoded/decoded using a variable radix
representation for permutations. Thus it is possible
to actually/practically encode (and decode) all chess positions
using a fixed length 155-bit code (or 157 if ep & castle are
also encoded).

4.5 Given that this uniform code comes quite close to the Karin's
Dad's limit of 160 bits max for any chess position, it seems
unlikely that any code which uses up fewer than 150 bits for
some chess positions would be able to keep the worst case under
the 160 bits target (i.e. the shorter the codes for some
positions, the longer the codes for the others).

To keep this post at a reasonable length, I had omitted various details (which
may be obvious to those thinkering with this problem).

```