Author: Tony Werten
Date: 22:10:55 09/25/03
Go up one level in this thread
On September 24, 2003 at 17:32:14, Gerd Isenberg wrote: >On September 24, 2003 at 17:11:14, Sune Fischer wrote: > >>On September 24, 2003 at 16:58:20, Gerd Isenberg wrote: >> >>>On September 24, 2003 at 15:04:36, Sune Fischer wrote: >>> >>>>On September 24, 2003 at 13:25:49, Leen Ammeraal wrote: >>>> >>>>>>The point is a bit of speed. You have to convert a hash signature into a >>>>>>hash table index. For a tablesize that is a power of 2, you can simply >>>>>>AND (mask) off the upper bits leaving a power-of-2 table index. For other >>>>>>sizes, you will end up doing a divide (mod) to get the remainder. The divide >>>>>>is not fast. >>>>>> >>>>>>How significant this is is debatable, but for some of us, "every cycle counts." >>>>>> >>>>>> >>>>>I agree that masking is to be preferred to the modulo operation. >>>>>However, what about tournaments (organized by others) that >>>>>allow a hash table size which corresponds to, for example, >>>>>12 MB entries in your table? >>>>>It seems not wise to me to use only 8 MB entries in this case >>>>>just to make the table length a power of 2. >>>> >>>>My sentiments exactly. >>>> >>>>While Bob is right that speed is a point, there are certainly other points to >>>>consider. >>>> >>>>In a typical modern machine we can expect something like 512 MB of system >>>>memory. Assuming hash entries are of power two, that means an always power 2 >>>>sized table could not be able to use more than 256 MB, or more generally half of >>>>the system memory. I'd call that a waste of resources. >>>> >>>>I always go for the limit myself, which seems to lie somewhere around 300-350 >>>>MB. So I basicly exchange my 0.05% modulo speedloss for a smaller tree due to >>>>larger hash. >>>> >>>>Speed is not *everything* :) >>> >>>Yes, but with Dieter's trick, one 32*32 = 64-bit mul is good enough with >>>appropriate table sizes. If you get some speed for nothing, depending on your >>>average cycles per node, and whether you probe in QSearch or not: >>> >>> UINT64 idx = (UINT64)tableSize * (hashKey64 & 0xffffffff); >>> return (UINT)(idx >> 32); >> >>OK, could be worth a shot :) >> >>>DIV - MUL latency (32-bit): >>>Athlon: 40 cycles versus 6 (both vector path). >>>Opteron: 39 cycles (vector path) versus 3 cycles (direct path). >>>AND is one cycle on both. >> >>Umm is it just me, or hasn't there been a lot of hype about processors being >>able to do several adds and muls in parallel per clock? > >not sure about P4 - may be with some special MMX or SSE SIMD instructions, >vector mul/add, e.g. MMX: > >PMADDWD mmreg1, mmreg2/mem64 > >Multiply signed packed 16-bit values and add the 32-bit results. >On Athlon DirectPath 3 - cycles. > >> >>I remember all they way back to the K6 that it was supposed to be able to do 2 >>adds and 2 mults in one clock, theoreticly. >>But surely I must be mistaken about that? > >no idea about K6, but i can't imagine... You're the specialist Gerd, maybe he means 2 LEA's in parallel ? Tony > >> >>-S. >>>Gerd >>> >>>> >>>>-S. >>>> >>>>>Leen Ammeraal
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.