Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: private % lessons - oups

Author: Gerd Isenberg

Date: 15:05:44 09/16/03

Go up one level in this thread


On September 16, 2003 at 17:48:32, Gerd Isenberg wrote:

>Hi all,
>
>Sorry for continuing my boring, slightly off topic, private math lessons for
>autodidactical reasons ;-)
>
>First i want to apologize for suborn you recently, trying an erroneous routine
>for div/mod, e.g. for calculating hash indices of fixed sized (pawn hash)
>tables.
>
>unsigned int mod49981 (unsigned __int64 a)
>{
>  unsigned __int64 m = (0x00014FAC00053EB1 * (a >> 32)
>                      + 0x00014FAC * (a & 0xffffffff));
>  int modulus = (int)(a - (__int64)(m>>32)* 49981);
>
>  if ( modulus < 0 )      // correct negative results
>    modulus += 49981;     // due to the remainder of 0.0003
>  // oups, that's not enough
>  return (unsigned int) modulus;
>}
>
>The idea is to do DIV by reciprocal fixed point multiplication like
>a * 1/49981, where 1/49981 is precalculated. The integer part of the quotient is
>multiplied "back" with the same constant 49981. The remainder is simply the
>difference from the divisor:
>
> a % 49981 := a - (a div 49981)*49981
>
>see Paul Hsieh's formula (modified to 64 bit):
>
>  a % (k*2^t) == a - (((((int)(((2^64)+k-1)/k)) * (a) / (2^64))) & (-2^t))*k
>
>http://www.azillionmonkeys.com/qed/adiv.html
>
>Due to rounding error of the reciprocal constant,
>the above routine is _not_ a correct % 49981 one!
>
>Few "remainders" are by up to 4 greater then the maximum of 49980.
>For a "modulus"-function a further correction via
>
>  if ( modulus >= 49981 ) // correct wrong results
>    modulus -= 49981;     // due to the remainder of 0.0003
>
>is necessary. For table-index calculation one may safe it, considering this in
>table allocation, e.g. to allocate 16-entry aligned 50000 instead of 49984,
>which is one to small!
>
>One may try odd "k" < 4096 and some "t" > 0 to avoid corrections at all.
>For pawn hashing i tried an odd mod-constant with low error margin of the
>reciprocal 64-bit constant, but closer to the allocated 64K (e.g. 64577) and
>don't care about correct modulus by doing a final AND 0x0ffff, with a slightly
>better hit rate:
>
unsigned int CPawnHashTable::getIndex(const CNode &node)
{
  unsigned __int64 a = node.pawnBB(0) - node.pawnBB(1);
  unsigned __int64 m = 0x000103cd3ddab652 * (a >> 32)
                     + 0x000103cd * (a & 0xffffffff);
  return (unsigned int)((m>>32)*645777 - a) & 0x0ffff;
                                   ^^^ oups, typo should be 64577
}
>
>IMHO on opteron the "optimized" assembler output should look like this:
>
    mov  rax, [node]
    mov  rcx, [rax.m_PawnBB + 0]  ;  white pawns
    sub  rcx, [rax.m_PawnBB + 8]  ; -black pawns
    mov  rax, 0x000103cd3ddab652  ; (((2^64)+645777-1)/645777)
    mul  rcx
    mov  rax, 64577   ; oups, typo
    mul  rdx
    sub  rax, rax
    and  rax, 0x0ffff
>
>Second mul (mov,sub,and) may use shorter and faster 32-bit ones, but i'm not
>sure about possible penalties if using same gp-register 64- and 32-bit wise in a
>row. Note that mul on opteron is no longer vector path instructions, 64-bit mul
>has latency of 5 cycles, 32-bit mul 3-cycles. 64-bit div/mod instead is vector
>path and takes 71 (DIV) or 75 (IDIV) cycles, as reported in latest Software
>Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors.
>
>A last comment on incremental zobrist keys for pawn hashing in conjunction with
>my program design: My node structure is well aligned, an additional "int" will
>blow it up over some bounderies without any severe changes. I hash eval and
>store eval + flags in main hash. And i do some (very) lazy eval, where i
>conditionally don't even probe pawn hash.
>
>An alibi not to try zobrist keys for pawn hashing at all
>(and lack of time of course), sorry!
>
>If you are looking for other reciprocal 64-bit constants you may try and improve
>(due to "double" resolution) this routine:
>
>void findGoodReciprocalConstants()
>{
>  double errorMargin = 0.0000001; // !? 0 is also fine here with double
>  double c = 0x100000000;
>  c = c * c - 1.0; // 2**64-1
>  for (int i = 1; i < 0x10000; i += 2)
>  {
>    double           fdiv = (c + i)/i;
>    unsigned __int64 idiv = fdiv;
>    double           rdiv = (double)idiv;
>    if ( fdiv >= rdiv - errorMargin && fdiv <= rdiv + errorMargin)
>      printf("0x%08x%08x for mod(%5d)\n", (int)(idiv>>32), (int)idiv, i);
>  }
>}
>
>For division/modulos by odd 64-bit constant less than 4096 a correction
>is not necessary.
>
>Regards,
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.