Computer Chess Club Archives


Search

Terms

Messages

Subject: private % lessons

Author: Gerd Isenberg

Date: 14:48:32 09/16/03


Hi all,

Sorry for continuing my boring, slightly off topic, private math lessons for
autodidactical reasons ;-)

First i want to apologize for suborn you recently, trying an erroneous routine
for div/mod, e.g. for calculating hash indices of fixed sized (pawn hash)
tables.

unsigned int mod49981 (unsigned __int64 a)
{
  unsigned __int64 m = (0x00014FAC00053EB1 * (a >> 32)
                      + 0x00014FAC * (a & 0xffffffff));
  int modulus = (int)(a - (__int64)(m>>32)* 49981);

  if ( modulus < 0 )      // correct negative results
    modulus += 49981;     // due to the remainder of 0.0003
  // oups, that's not enough
  return (unsigned int) modulus;
}

The idea is to do DIV by reciprocal fixed point multiplication like
a * 1/49981, where 1/49981 is precalculated. The integer part of the quotient is
multiplied "back" with the same constant 49981. The remainder is simply the
difference from the divisor:

 a % 49981 := a - (a div 49981)*49981

see Paul Hsieh's formula (modified to 64 bit):

  a % (k*2^t) == a - (((((int)(((2^64)+k-1)/k)) * (a) / (2^64))) & (-2^t))*k

http://www.azillionmonkeys.com/qed/adiv.html

Due to rounding error of the reciprocal constant,
the above routine is _not_ a correct % 49981 one!

Few "remainders" are by up to 4 greater then the maximum of 49980.
For a "modulus"-function a further correction via

  if ( modulus >= 49981 ) // correct wrong results
    modulus -= 49981;     // due to the remainder of 0.0003

is necessary. For table-index calculation one may safe it, considering this in
table allocation, e.g. to allocate 16-entry aligned 50000 instead of 49984,
which is one to small!

One may try odd "k" < 4096 and some "t" > 0 to avoid corrections at all.
For pawn hashing i tried an odd mod-constant with low error margin of the
reciprocal 64-bit constant, but closer to the allocated 64K (e.g. 64577) and
don't care about correct modulus by doing a final AND 0x0ffff, with a slightly
better hit rate:

unsigned int CPawnHashTable::getIndex(const CNode &node)
{
  unsigned __int64 a = node.pawnBB(0) - node.pawnBB(1);
  unsigned __int64 m = 0x000103cd3ddab652 * (a >> 32)
                     + 0x000103cd * (a & 0xffffffff);
  return (unsigned int)((m>>32)*645777 - a) & 0x0ffff;
}

IMHO on opteron the "optimized" assembler output should look like this:

    mov  rax, [node]
    mov  rcx, [rax.m_PawnBB + 0]  ;  white pawns
    sub  rcx, [rax.m_PawnBB + 8]  ; -black pawns
    mov  rax, 0x000103cd3ddab652  ; (((2^64)+645777-1)/645777)
    mul  rcx
    mov  rax, 645777
    mul  rdx
    sub  rax, rax
    and  rax, 0x0ffff

Second mul (mov,sub,and) may use shorter and faster 32-bit ones, but i'm not
sure about possible penalties if using same gp-register 64- and 32-bit wise in a
row. Note that mul on opteron is no longer vector path instructions, 64-bit mul
has latency of 5 cycles, 32-bit mul 3-cycles. 64-bit div/mod instead is vector
path and takes 71 (DIV) or 75 (IDIV) cycles, as reported in latest Software
Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors.

A last comment on incremental zobrist keys for pawn hashing in conjunction with
my program design: My node structure is well aligned, an additional "int" will
blow it up over some bounderies without any severe changes. I hash eval and
store eval + flags in main hash. And i do some (very) lazy eval, where i
conditionally don't even probe pawn hash.

An alibi not to try zobrist keys for pawn hashing at all
(and lack of time of course), sorry!

If you are looking for other reciprocal 64-bit constants you may try and improve
(due to "double" resolution) this routine:

void findGoodReciprocalConstants()
{
  double errorMargin = 0.0000001; // !? 0 is also fine here with double
  double c = 0x100000000;
  c = c * c - 1.0; // 2**64-1
  for (int i = 1; i < 0x10000; i += 2)
  {
    double           fdiv = (c + i)/i;
    unsigned __int64 idiv = fdiv;
    double           rdiv = (double)idiv;
    if ( fdiv >= rdiv - errorMargin && fdiv <= rdiv + errorMargin)
      printf("0x%08x%08x for mod(%5d)\n", (int)(idiv>>32), (int)idiv, i);
  }
}

For division/modulos by odd 64-bit constant less than 4096 a correction
is not necessary.

Regards,
Gerd




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.