Author: Vincent Diepeveen
Date: 09:45:23 04/15/04
Go up one level in this thread
On April 15, 2004 at 09:01:44, Robert Hyatt wrote: >On April 15, 2004 at 06:05:15, Joachim Rang wrote: > >>On April 14, 2004 at 22:49:39, Robert Hyatt wrote: >> >>>I just finished some HT on / HT off tests to see how things have changed in >>>Crafty since some of the recent NUMA-related memory changes that were made. >>> >>>Point 1. HT now speeds Crafty up between 5 and 10% max. A year ago this was >>>30%. What did I learn? Nothing new. Memory waits benefit HT. Eugene and I >>>worked on removing several shared memory interactions which led to better cache >>>utilization, less cache invalidates (very slow) and improved performance a good >>>bit. But at the same time, now HT doesn't have the excessive memory waits it >>>had before and so the speedup is not as good. >>> >>>Point 2. HT now actually slows things down due to SMP overhead. IE I lose 30% >>>per CPU, roughly, due to SMP overhead. HT now only gives 5-10% back. This is a >>>net loss. I am now running my dual with HT disabled... >>> >>>More as I get more data... Here is two data points however: >>> >>>pos1. cpus=2 (no HT) NPS = 2.07M time=18.13 >>> cpus=4 NPS = 2.08M time=28.76 >>> >>>pos2. cpus=2 NPS = 1.87M time=58.48 >>> cpus=4 NPS = 2.01M time=66.00 >>> >>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead. >>>Ugly. Position 2 gives about 5% more nps, but again the SMP overhead washes >>>that out and there is a net loss. I should run the speedup tests several times, >>>but the NPS numbers don't change much, and the speedup could change. But this >>>offers enough.. >> >> >>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz >>gains still 25% form HT (in this specific position) >> >>cpus=2 NPS = 2.35 >>cpus=4 NPS = 2,95 >> >>I have unfortunately no information about search time. >> >>Does that mean Fritz 8 is poorly optimized? >> >>regards Joachim > > >It means it has some cache issues that can be fixed to speed it up further, yes. Not at all. Fritz is p4 hand optimized assembly currently. I expect him to work hard on an opteron hand optimized assembly version from fritz now (probably already 1 year working at it by now). A possibility could be that last years Fritz evaluation function has become so much slower than it was that it has most likely a need for an eval hashtable, just like i use in DIEP already for many years. My guess is that it just uses more hashtables than crafty. Crafty isn't probing in qsearch for example. DIEP is. Diep's doing a lot of more stuff in qsearch than crafty. So using a transposition table there makes a lot more sense. All commercial programs that i know (junior's search is so different that i would bet it is not the case with junior) are doing checks in qsearch. So a possible alternative to evaltable would be hashing in qsearch. Evaltable would be my guess though. Doing a random lookup to a big hashtable at a 400Mhz dual Xeon costs when it is not in the cache around 400ns. That's even at 3Ghz just 1200 cycles. 1 node on average costs assuming 1 mln nps at 3Ghz : 3000 cycles. Vaste majority of nodes do not get evaluated at all of course. That shows that Fritz' eval needs a multiple of that for evaluation nowadays. When not storing eval in transpositiontable but only in a special eval table, that will give a >= 50% lookuprate at evaltable (more likely 60%). So it makes sense to use an eval table for Fritz. Something crafty doesn't need as its eval is smaller than tiny.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.