Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Vincent Diepeveen

Date: 09:45:23 04/15/04

Go up one level in this thread


On April 15, 2004 at 09:01:44, Robert Hyatt wrote:

>On April 15, 2004 at 06:05:15, Joachim Rang wrote:
>
>>On April 14, 2004 at 22:49:39, Robert Hyatt wrote:
>>
>>>I just finished some HT on / HT off tests to see how things have changed in
>>>Crafty since some of the recent NUMA-related memory changes that were made.
>>>
>>>Point 1.  HT now speeds Crafty up between 5 and 10% max.  A year ago this was
>>>30%.  What did I learn?  Nothing new.  Memory waits benefit HT.  Eugene and I
>>>worked on removing several shared memory interactions which led to better cache
>>>utilization, less cache invalidates (very slow) and improved performance a good
>>>bit.  But at the same time, now HT doesn't have the excessive memory waits it
>>>had before and so the speedup is not as good.
>>>
>>>Point 2.  HT now actually slows things down due to SMP overhead.  IE I lose 30%
>>>per CPU, roughly, due to SMP overhead.  HT now only gives 5-10% back.  This is a
>>>net loss.  I am now running my dual with HT disabled...
>>>
>>>More as I get more data...  Here is two data points however:
>>>
>>>pos1.  cpus=2 (no HT)  NPS = 2.07M  time=18.13
>>>       cpus=4          NPS = 2.08M  time=28.76
>>>
>>>pos2.  cpus=2          NPS = 1.87M  time=58.48
>>>       cpus=4          NPS = 2.01M  time=66.00
>>>
>>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead.
>>>Ugly.  Position 2 gives about 5% more nps, but again the SMP overhead washes
>>>that out and there is a net loss.  I should run the speedup tests several times,
>>>but the NPS numbers don't change much, and the speedup could change.  But this
>>>offers enough..
>>
>>
>>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz
>>gains still 25% form HT (in this specific position)
>>
>>cpus=2    NPS = 2.35
>>cpus=4    NPS = 2,95
>>
>>I have unfortunately no information about search time.
>>
>>Does that mean Fritz 8 is poorly optimized?
>>
>>regards Joachim
>
>
>It means it has some cache issues that can be fixed to speed it up further, yes.

Not at all.

Fritz is p4 hand optimized assembly currently. I expect him to work hard on an
opteron hand optimized assembly version from fritz now (probably already 1 year
working at it by now).

A possibility could be that last years Fritz evaluation function has become so
much slower than it was that it has most likely a need for an eval hashtable,
just like i use in DIEP already for many years.

My guess is that it just uses more hashtables than crafty. Crafty isn't probing
in qsearch for example. DIEP is. Diep's doing a lot of more stuff in qsearch
than crafty. So using a transposition table there makes a lot more sense.

All commercial programs that i know (junior's search is so different that i
would bet it is not the case with junior) are doing checks in qsearch.

So a possible alternative to evaltable would be hashing in qsearch.

Evaltable would be my guess though. Doing a random lookup to a big hashtable at
a 400Mhz dual Xeon costs when it is not in the cache around 400ns.

That's even at 3Ghz just 1200 cycles.

1 node on average costs assuming 1 mln nps at 3Ghz : 3000 cycles.

Vaste majority of nodes do not get evaluated at all of course.

That shows that Fritz' eval needs a multiple of that for evaluation nowadays.

When not storing eval in transpositiontable but only in a special eval table,
that will give a >= 50% lookuprate at evaltable (more likely 60%).

So it makes sense to use an eval table for Fritz.

Something crafty doesn't need as its eval is smaller than tiny.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.