Computer Chess Club Archives




Subject: Counting & Encoding Any Chess Position in 157 bits

Author: Ratko V Tomic

Date: 23:53:57 11/09/99

In an earlier thread on applying Huffman codes for enumerating and encoding
chess positions, one sub-thread dealt with the issue of what kind of constraints
on number of promotions, expressed in terms of captures for a given Material
Content (MC, i.e. the counts of pieces by piece type) state, can be given. Any
such constraint helps reduce the number of valid MC states, thus reduces the
upper bound on the number of legal positions (hence on the code length needed to
encode an arbitrary position). In that context (recalling his earlier program
enumerating chess positions) Blass Uri wrote:

> My program used this fact [that the #of promotions<= #missing pawns for
> each side] and the fact that:
> an upper bound for the number of promoted white pieces is
> 2*the number of captures of black pieces
> +the number of captures of white pieces(I include pawns also as pieces).

As illustrated in the earlier reply to the above, some of the optimal
capture+promotion sequencies (max promotions for min captures) do not reach the
bound above, which means (since these optimal transactions cannot be exceeded by
any other transactions) that a tighter upper bound exists. The tightest upper
bound of this type (on promotions in terms of captured material, its kind and
its quantity) is:

Number of white promotions cannot exceed

    2 * (# of black pawns captured)
  + # of black non-pawns captured
  + # of white pieces captured (pawns or non-pawns),

or symbolically:

      wp <= 2*bpc + bnc + wc   (1)

     (and similar for the reverse colors, via prefix swap w<->b)

Note that in your equation the 2nd term (bnc) has also a factor of 2, making
your upper bound higher whenever there are black non-pawns captured. This bound
is the tightest possible (in these terms) since it is achieved exactly (turned
into equality) by the optimal capture+promotion sequences listed earlier, hence
a lowering of the upper bound would necessarily make it false. Using this bound
(or rather, a derived one in terms of excess and defect in material for a given
MC state, classified by pawns and non-pawns, for white and black), the
enumeration algorithm worked as follows:

 1. Compute all Material Content (MC) states for white. The MC state
    is defined by counts of all piece types (Q,R,B,N,P).

    1.1 There are 8694 distinct MC states for white (the loop actually
        iterates exactly this many times)

    1.2 The only constraint on the one side MC states is that:

           # of excess pieces + # of pawns <= 8

        where the "excess pieces" are non-pawns which are not present
        in the initial chess positions (i.e. W.Queens above 1, W.Rooks
        above 2, etc). This is simple consequence of the fact that
        each promotion removes exactly one pawn. Since white can have
        (with cooperating opponent) all 8 pawns promoted without having
        any of its pieces captured, the above is the strongest constraint
        on promotions in terms of piece counts for one side (black or white).

 2. For each white MC state, pair it with up to 8694 MC states for
    the black, resulting in at most 8694^2=75,585,636 potential pairs
    (i.e. the full MC states).

    2.1 If the pairing of MC state x with MC state y was computed
        then it is not necessary to compute pair y,x (reversed color),
        so the pairing loop makes only 8694*8695/2=37,797,165 iterations.

    2.2 The new promotion/capture constraint (1) is applied to the pairs,
        resulting in rejection of 17,501,326 pair combinations, which
        violated the inequality, leaving 58,084,310 complete (black+white)
        valid MC states. Hence the MC state can be encoded in 25.792 bits,
        i.e. in using a 26 bit whole bit codes.

 3. For each valid complete MC state the number of Material
    Placements (MP) is computed, and these were added over all the
    MC states (58,084,310 such states).

    3.1 The number of distinct chess positions was thus obtained as:

              N = 2.463947 * 10^46 = 154.110 bits

        This is lower than your earlier constraint 3.701*10^46, as one
        would expect due to the tighter promotion constraint.

    3.2 The maximum placements were generated by the 30 piece symmetrical
        MC state given as: W.QRNBP=3,3,2,2,4 B.QRNBP=3,3,2,2,4 and it had
        the #MP=3.067753*10^42=141.138 bits (the same value is obtained,
        of course, by all other states differing from this one only in
        the independent permutations in W & B non-pawn counts).

    3.3 The number of MC states which used (for their #MP) 130 bits or
        more was counted and found to be 1,016,088 (smaller than
        1024^2, i.e. smaller than 20 bits).

    3.4 The Bishop optimization (discussed in an earlier sub-thread here)
        was not performed in the current version (due to the estimated
        total gain of at most few millibits from the 154.110 bits, while
        complicating the code great deal compared to the single bishop
        type enumeration).

    3.5 The pawn optimization (48 available squares instead of 64
        squares) was used.

    3.6 Castle state was not included in the count, but it was estimated
        that it would add less than 0.13 bits to the position code, in
        the worst case (check earlier discussion on king pair + castle
        status encoding in 12 bits). The ep status also wasn't encoded,
        but since there could be at most 4 ep capture targets (white
        assumed to move, thus 4 targets are black pawns on the 5-th rank),
        the ep would add at most log2(5)=2.32 bits (to the 154.110 bit
        position code length; some checks on pawn counts in MC state could
        lower this further). Thus the ep+castle status would still keep
        the full position code length below 157 bits.

  4. An encoding method which achieves uniform (across all positions)
     whole-bit code length of 155 bits (or 157 bits when ep & castle
     are included) consists of encoding any position with a 2 part
     code, MC part (specifying one of the MC states among the 58,084,310
     valid MC states) plus an MP part indexing a particular placement
     permutation among all the placements generated by this MC state.

     The specific steps needed to achieve the uniform length 155-bit
     code in this manner are as follows:

     4.1 Replace the fixed length 26 bit MC code with a variable length
         Huffman code, which uses the Number of MP states (for this MC)
         divided by the total number of states (N=2.46*10^46) as the
         Huffman weight (probability) for this MC state.

     4.2 If, for a given MC state, the #of MPs uses X bits, then the
         MC state gets encoded in L(MC)=(155-X) bits. This follows from
         the Huffman code length L(MC) = -log2(p) = -log2[(2^X)/N)] =
         log2(N)-X = 155-X. The full code length is L(MC)+L(MP) =
         = (155-X)+X = 155, i.e. it is a fixed length 155-bit code.

     4.3 The encoder/decoder of the MC part of the code would use a
         sorted array of one side MC states (8694 entries). The full
         MC code (for both sides) would be constructed on the fly
         as MC pairs, and their validity would be checked as in the
         section 2, 2.1, 2.2. If the optimization for the bishops were
         added, the one side MC array would have 26,298 entries
         (instead 8694).

     4.4 The MP part of the code (the placement index for a given MC
         state) can be encoded/decoded using a variable radix
         representation for permutations. Thus it is possible
         to actually/practically encode (and decode) all chess positions
         using a fixed length 155-bit code (or 157 if ep & castle are
         also encoded).

     4.5 Given that this uniform code comes quite close to the Karin's
         Dad's limit of 160 bits max for any chess position, it seems
         unlikely that any code which uses up fewer than 150 bits for
         some chess positions would be able to keep the worst case under
         the 160 bits target (i.e. the shorter the codes for some
         positions, the longer the codes for the others).

To keep this post at a reasonable length, I had omitted various details (which
may be obvious to those thinkering with this problem).

This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.