Author: Vincent Diepeveen
Date: 07:08:34 04/16/04
Go up one level in this thread
On April 16, 2004 at 05:47:42, Vasik Rajlich wrote: >On April 15, 2004 at 13:10:26, Robert Hyatt wrote: > >>On April 15, 2004 at 12:45:23, Vincent Diepeveen wrote: >> >>>On April 15, 2004 at 09:01:44, Robert Hyatt wrote: >>> >>>>On April 15, 2004 at 06:05:15, Joachim Rang wrote: >>>> >>>>>On April 14, 2004 at 22:49:39, Robert Hyatt wrote: >>>>> >>>>>>I just finished some HT on / HT off tests to see how things have changed in >>>>>>Crafty since some of the recent NUMA-related memory changes that were made. >>>>>> >>>>>>Point 1. HT now speeds Crafty up between 5 and 10% max. A year ago this was >>>>>>30%. What did I learn? Nothing new. Memory waits benefit HT. Eugene and I >>>>>>worked on removing several shared memory interactions which led to better cache >>>>>>utilization, less cache invalidates (very slow) and improved performance a good >>>>>>bit. But at the same time, now HT doesn't have the excessive memory waits it >>>>>>had before and so the speedup is not as good. >>>>>> >>>>>>Point 2. HT now actually slows things down due to SMP overhead. IE I lose 30% >>>>>>per CPU, roughly, due to SMP overhead. HT now only gives 5-10% back. This is a >>>>>>net loss. I am now running my dual with HT disabled... >>>>>> >>>>>>More as I get more data... Here is two data points however: >>>>>> >>>>>>pos1. cpus=2 (no HT) NPS = 2.07M time=18.13 >>>>>> cpus=4 NPS = 2.08M time=28.76 >>>>>> >>>>>>pos2. cpus=2 NPS = 1.87M time=58.48 >>>>>> cpus=4 NPS = 2.01M time=66.00 >>>>>> >>>>>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead. >>>>>>Ugly. Position 2 gives about 5% more nps, but again the SMP overhead washes >>>>>>that out and there is a net loss. I should run the speedup tests several times, >>>>>>but the NPS numbers don't change much, and the speedup could change. But this >>>>>>offers enough.. >>>>> >>>>> >>>>>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz >>>>>gains still 25% form HT (in this specific position) >>>>> >>>>>cpus=2 NPS = 2.35 >>>>>cpus=4 NPS = 2,95 >>>>> >>>>>I have unfortunately no information about search time. >>>>> >>>>>Does that mean Fritz 8 is poorly optimized? >>>>> >>>>>regards Joachim >>>> >>>> >>>>It means it has some cache issues that can be fixed to speed it up further, yes. >>> >>>Not at all. >>> >>>Fritz is p4 hand optimized assembly currently. I expect him to work hard on an >>>opteron hand optimized assembly version from fritz now (probably already 1 year >>>working at it by now). >> >>Sorry, but you should stick to topics you know something about. SMT works best >>in programs where there are memory reads/writes that stall a thread. As you >>work out those stalls, SMT pays off less gain. My current numbers clearly show >>this as opposed to the numbers I (and others) posted when I first got my SMT >>box... >> >>> >>>A possibility could be that last years Fritz evaluation function has become so >>>much slower than it was that it has most likely a need for an eval hashtable, >>>just like i use in DIEP already for many years. >>> >>>My guess is that it just uses more hashtables than crafty. Crafty isn't probing >>>in qsearch for example. DIEP is. Diep's doing a lot of more stuff in qsearch >>>than crafty. So using a transposition table there makes a lot more sense. >> >>That is possible. However, as I said, it is a trade-off. I took hash out of >>q-search and it was perfectly break-even. Tree grew a but but the search got >>proportionally faster. No gain or loss. Yet it results in lower bandwidth and >>with the PIV long cache line, it is probably (at least for Crafty) better than a >>break-even deal today. >> >>> >>>All commercial programs that i know (junior's search is so different that i >>>would bet it is not the case with junior) are doing checks in qsearch. >> >>But he does not even hash probe in last ply of normal search.. >> >>And it appears he has no q-search. >> > >Why do you say this? I drew that conclusion a few years ago. It doesn't need to be the case nowadays in junior. >I guess the only alternative to q-search is some sort of an SEE at depth == 0. >Or is there some other possibility? Suppose last few plies you just do a tactical verification search or whatever and that you rely upon piece square tables. You can do a slow 'makemove' of course and then evaluate. You can also throw away the entire qsearch and make a small list of attacked pieces for white and attacked pieces of black. Then return the evaluation + canwin(side); Keep the canwin function simple. That's very quick. By the way this is already in a book i read from Jaap v/d Herik. Written around 1984 or so... >>> >>>So a possible alternative to evaltable would be hashing in qsearch. >>> >>>Evaltable would be my guess though. Doing a random lookup to a big hashtable at >>>a 400Mhz dual Xeon costs when it is not in the cache around 400ns. >>> >> >> >> >>Depends. Use big memory pages and it costs 150 ns. No TLB thrashing then. >> > >Based on some reading that I did on the k8, it seemed that a memory lookup was >around 150 cycles there. (And 250 cycles on k7.) > >Did I misunderstand? Or does this number change when you use multiple >processors? > >If so, then hashing should be done differently on multiple processors than on >single processors. For example, ETC would behave differently. > >Vas > >> >> >>>That's even at 3Ghz just 1200 cycles. >>> >>>1 node on average costs assuming 1 mln nps at 3Ghz : 3000 cycles. >>> >>>Vaste majority of nodes do not get evaluated at all of course. >>> >>>That shows that Fritz' eval needs a multiple of that for evaluation nowadays. >>> >>>When not storing eval in transpositiontable but only in a special eval table, >>>that will give a >= 50% lookuprate at evaltable (more likely 60%). >>> >>>So it makes sense to use an eval table for Fritz. >>> >>>Something crafty doesn't need as its eval is smaller than tiny. >> >>small != bad, however.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.