Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hash table size - is a power of 2 still an advantage these days?

Author: Tony Werten

Date: 22:10:55 09/25/03

Go up one level in this thread


On September 24, 2003 at 17:32:14, Gerd Isenberg wrote:

>On September 24, 2003 at 17:11:14, Sune Fischer wrote:
>
>>On September 24, 2003 at 16:58:20, Gerd Isenberg wrote:
>>
>>>On September 24, 2003 at 15:04:36, Sune Fischer wrote:
>>>
>>>>On September 24, 2003 at 13:25:49, Leen Ammeraal wrote:
>>>>
>>>>>>The point is a bit of speed.   You have to convert a hash signature into a
>>>>>>hash table index.  For a tablesize that is a power of 2, you can simply
>>>>>>AND (mask) off the upper bits leaving a power-of-2 table index.  For other
>>>>>>sizes, you will end up doing a divide (mod) to get the remainder.  The divide
>>>>>>is not fast.
>>>>>>
>>>>>>How significant this is is debatable, but for some of us, "every cycle counts."
>>>>>>
>>>>>>
>>>>>I agree that masking is to be preferred to the modulo operation.
>>>>>However, what about tournaments (organized by others) that
>>>>>allow a hash table size which corresponds to, for example,
>>>>>12 MB entries in your table?
>>>>>It seems not wise to me to use only 8 MB entries in this case
>>>>>just to make the table length a power of 2.
>>>>
>>>>My sentiments exactly.
>>>>
>>>>While Bob is right that speed is a point, there are certainly other points to
>>>>consider.
>>>>
>>>>In a typical modern machine we can expect something like 512 MB of system
>>>>memory. Assuming hash entries are of power two, that means an always power 2
>>>>sized table could not be able to use more than 256 MB, or more generally half of
>>>>the system memory. I'd call that a waste of resources.
>>>>
>>>>I always go for the limit myself, which seems to lie somewhere around 300-350
>>>>MB. So I basicly exchange my 0.05% modulo speedloss for a smaller tree due to
>>>>larger hash.
>>>>
>>>>Speed is not *everything* :)
>>>
>>>Yes, but with Dieter's trick, one 32*32 = 64-bit mul is good enough with
>>>appropriate table sizes. If you get some speed for nothing, depending on your
>>>average cycles per node, and whether you probe in QSearch or not:
>>>
>>> UINT64	idx = (UINT64)tableSize * (hashKey64 & 0xffffffff);
>>> return (UINT)(idx >> 32);
>>
>>OK, could be worth a shot :)
>>
>>>DIV - MUL latency (32-bit):
>>>Athlon:  40 cycles versus 6 (both vector path).
>>>Opteron: 39 cycles (vector path) versus 3 cycles (direct path).
>>>AND is one cycle on both.
>>
>>Umm is it just me, or hasn't there been a lot of hype about processors being
>>able to do several adds and muls in parallel per clock?
>
>not sure about P4 - may be with some special MMX or SSE SIMD instructions,
>vector mul/add, e.g. MMX:
>
>PMADDWD mmreg1, mmreg2/mem64
>
>Multiply signed packed 16-bit values and add the 32-bit results.
>On Athlon DirectPath 3 - cycles.
>
>>
>>I remember all they way back to the K6 that it was supposed to be able to do 2
>>adds and 2 mults in one clock, theoreticly.
>>But surely I must be mistaken about that?
>
>no idea about K6, but i can't imagine...

You're the specialist Gerd, maybe he means 2 LEA's in parallel ?

Tony

>
>>
>>-S.
>>>Gerd
>>>
>>>>
>>>>-S.
>>>>
>>>>>Leen Ammeraal



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.