Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hash table size - is a power of 2 still an advantage these days?

Author: Vincent Diepeveen

Date: 19:55:56 09/24/03

Go up one level in this thread


On September 24, 2003 at 17:32:16, Dieter Buerssner wrote:

>On September 24, 2003 at 16:58:20, Gerd Isenberg wrote:
>
>>Yes, but with Dieter's trick, one 32*32 = 64-bit mul is good enough with
>>appropriate table sizes. If you get some speed for nothing, depending on your
>>average cycles per node, and whether you probe in QSearch or not:
>>
>> UINT64	idx = (UINT64)tableSize * (hashKey64 & 0xffffffff);
>> return (UINT)(idx >> 32);
>>
>>DIV - MUL latency (32-bit):
>>Athlon:  40 cycles versus 6 (both vector path).
>>Opteron: 39 cycles (vector path) versus 3 cycles (direct path).
>>AND is one cycle on both.
>
>Gerd, thanks for advertizing my idea :-) One may add, that one can get another
>cycle in many cases, compared to the and method. Robert mentioned 48 byte
>entries. This is not a power of two. So for the actual adress calculation, at
>least one additional instruction will be needed (multiplying be the factor 3
>that is one prime factor of 48). This factor can be multiplied into the
>tableSize above in advance, and when using a macro (probably it is not easy with
>inline here), that addional step will be avoided.
>
>Somthing like
>
>  struct hash_entry *p = (struct hash_entry *)((unsigned char *)(base_adress +
>(size_t)((UINT64)tableSize_times_48 * (hashKey64 & 0xffffffff))));
>
>The casts look horrible, but are Standard C. A decent compiler on x86 should
>create one mul instruction, fetch edx of the result, and add it to base_adress
>(or use an indexed adressing mode). Hidden in a macro, the code will not look
>horrible (and as another advantage, you can easily change the macro, to try
>different methods).
>
>About the speed differences: I found practically none between modulo (on a 32
>bit part of the hash_key, never tried it on 64 bits), mul and the & (2**k-1)
>approach. Modulo really seemed a very little bit slower, but around the noise
>level. Also, for large hash tables, we can expect over 500 cycles for the main
>memory accesses. If we compare this to the instruction timings you have given,
>and also take into account, that other parts of the engine will most probably
>take significantly longer than the time in Hash Probe/Store functions, one
>should conclude, that this is really nothing to worry about.
>
>On architectures, where there mod is no upcode for mod, it might be a measurable
>difference, but I guess still small on any slightly fast computer. When there is
>no mul instruction either ...
>
>Regards,
>Dieter

Yo, all well but i plan to get back to the AND appraoch actually.

The time needed for modulo is the last of my concern, though part of it.
it won't be suprise to you that i run on supercomputer cc-NUMA. Also most
persons own single cpu machine not dual. There is 2 diep versions.

Dual version and single cpu. Yes i care for all my users. Everyone what he wants
:)

Basically the single cpu approach also used for cc NUMA. nice local accessing.

So i should call single cpu = numa ugh.

to the point.

i store 44 bits into hashtable of course.

40 most significant and 4 least significant.

So in total 48 bits.

If i do AND then i am 100% sure that DIEP will use 64 bits to index.

i'm a bit more hesitating whether it does when not using AND.

your trick involves multiplication. i will write out at the bathroom this
morning whether i can garantuee that there is not a single possibility that the
least significant 24 digits (in fact 20 of them) can go to the same slot.

with modulo i seem pretty safe here.

I plan to use 256MB for local hashtable a processor world champs 2003.
that's 2^24 entry size and 16 bytes an entry. So garantuees 64 bits then
using AND.

But it would be real nice if that can be done better. Do you garantuee 64 bits
with your multiplication trick, assuming say 300MB hashtable, and storing 40
most significant bits and 4 least significant bits, assuming 8 probes?

Best regards,
Vincent






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.