Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Some new hyper-threading info.

Author: Vincent Diepeveen
Date: 07:08:34 04/16/04
On April 16, 2004 at 05:47:42, Vasik Rajlich wrote:

>On April 15, 2004 at 13:10:26, Robert Hyatt wrote:
>
>>On April 15, 2004 at 12:45:23, Vincent Diepeveen wrote:
>>
>>>On April 15, 2004 at 09:01:44, Robert Hyatt wrote:
>>>
>>>>On April 15, 2004 at 06:05:15, Joachim Rang wrote:
>>>>
>>>>>On April 14, 2004 at 22:49:39, Robert Hyatt wrote:
>>>>>
>>>>>>I just finished some HT on / HT off tests to see how things have changed in
>>>>>>Crafty since some of the recent NUMA-related memory changes that were made.
>>>>>>
>>>>>>Point 1.  HT now speeds Crafty up between 5 and 10% max.  A year ago this was
>>>>>>30%.  What did I learn?  Nothing new.  Memory waits benefit HT.  Eugene and I
>>>>>>worked on removing several shared memory interactions which led to better cache
>>>>>>utilization, less cache invalidates (very slow) and improved performance a good
>>>>>>bit.  But at the same time, now HT doesn't have the excessive memory waits it
>>>>>>had before and so the speedup is not as good.
>>>>>>
>>>>>>Point 2.  HT now actually slows things down due to SMP overhead.  IE I lose 30%
>>>>>>per CPU, roughly, due to SMP overhead.  HT now only gives 5-10% back.  This is a
>>>>>>net loss.  I am now running my dual with HT disabled...
>>>>>>
>>>>>>More as I get more data...  Here is two data points however:
>>>>>>
>>>>>>pos1.  cpus=2 (no HT)  NPS = 2.07M  time=18.13
>>>>>>       cpus=4          NPS = 2.08M  time=28.76
>>>>>>
>>>>>>pos2.  cpus=2          NPS = 1.87M  time=58.48
>>>>>>       cpus=4          NPS = 2.01M  time=66.00
>>>>>>
>>>>>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead.
>>>>>>Ugly.  Position 2 gives about 5% more nps, but again the SMP overhead washes
>>>>>>that out and there is a net loss.  I should run the speedup tests several times,
>>>>>>but the NPS numbers don't change much, and the speedup could change.  But this
>>>>>>offers enough..
>>>>>
>>>>>
>>>>>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz
>>>>>gains still 25% form HT (in this specific position)
>>>>>
>>>>>cpus=2    NPS = 2.35
>>>>>cpus=4    NPS = 2,95
>>>>>
>>>>>I have unfortunately no information about search time.
>>>>>
>>>>>Does that mean Fritz 8 is poorly optimized?
>>>>>
>>>>>regards Joachim
>>>>
>>>>
>>>>It means it has some cache issues that can be fixed to speed it up further, yes.
>>>
>>>Not at all.
>>>
>>>Fritz is p4 hand optimized assembly currently. I expect him to work hard on an
>>>opteron hand optimized assembly version from fritz now (probably already 1 year
>>>working at it by now).
>>
>>Sorry, but you should stick to topics you know something about.  SMT works best
>>in programs where there are memory reads/writes that stall a thread.  As you
>>work out those stalls, SMT pays off less gain.  My current numbers clearly show
>>this as opposed to the numbers I (and others) posted when I first got my SMT
>>box...
>>
>>>
>>>A possibility could be that last years Fritz evaluation function has become so
>>>much slower than it was that it has most likely a need for an eval hashtable,
>>>just like i use in DIEP already for many years.
>>>
>>>My guess is that it just uses more hashtables than crafty. Crafty isn't probing
>>>in qsearch for example. DIEP is. Diep's doing a lot of more stuff in qsearch
>>>than crafty. So using a transposition table there makes a lot more sense.
>>
>>That is possible.  However, as I said, it is a trade-off.  I took hash out of
>>q-search and it was perfectly break-even.  Tree grew a but but the search got
>>proportionally faster.  No gain or loss.  Yet it results in lower bandwidth and
>>with the PIV long cache line, it is probably (at least for Crafty) better than a
>>break-even deal today.
>>
>>>
>>>All commercial programs that i know (junior's search is so different that i
>>>would bet it is not the case with junior) are doing checks in qsearch.
>>
>>But he does not even hash probe in last ply of normal search..
>>
>>And it appears he has no q-search.
>>
>
>Why do you say this?

I drew that conclusion a few years ago. It doesn't need to be the case nowadays
in junior.

>I guess the only alternative to q-search is some sort of an SEE at depth == 0.
>Or is there some other possibility?

Suppose last few plies you just do a tactical verification search or whatever
and that you rely upon piece square tables.

You can do a slow 'makemove' of course and then evaluate.

You can also throw away the entire qsearch and make a small list of attacked
pieces for white and attacked pieces of black.

Then return the evaluation + canwin(side);

Keep the canwin function simple.

That's very quick.

By the way this is already in a book i read from Jaap v/d Herik. Written around
1984 or so...

>>>
>>>So a possible alternative to evaltable would be hashing in qsearch.
>>>
>>>Evaltable would be my guess though. Doing a random lookup to a big hashtable at
>>>a 400Mhz dual Xeon costs when it is not in the cache around 400ns.
>>>
>>
>>
>>
>>Depends.  Use big memory pages and it costs 150 ns.  No TLB thrashing then.
>>
>
>Based on some reading that I did on the k8, it seemed that a memory lookup was
>around 150 cycles there. (And 250 cycles on k7.)
>
>Did I misunderstand? Or does this number change when you use multiple
>processors?
>
>If so, then hashing should be done differently on multiple processors than on
>single processors. For example, ETC would behave differently.
>
>Vas
>
>>
>>
>>>That's even at 3Ghz just 1200 cycles.
>>>
>>>1 node on average costs assuming 1 mln nps at 3Ghz : 3000 cycles.
>>>
>>>Vaste majority of nodes do not get evaluated at all of course.
>>>
>>>That shows that Fritz' eval needs a multiple of that for evaluation nowadays.
>>>
>>>When not storing eval in transpositiontable but only in a special eval table,
>>>that will give a >= 50% lookuprate at evaltable (more likely 60%).
>>>
>>>So it makes sense to use an eval table for Fritz.
>>>
>>>Something crafty doesn't need as its eval is smaller than tiny.
>>
>>small != bad, however.
Re: Some new hyper-threading info. Vasik Rajlich 02:31:53 04/17/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.