Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Vincent Diepeveen

Date: 11:22:13 04/16/04

Go up one level in this thread


On April 16, 2004 at 12:40:21, Robert Hyatt wrote:

>On April 16, 2004 at 10:05:04, Vincent Diepeveen wrote:
>
>>On April 15, 2004 at 13:10:26, Robert Hyatt wrote:
>>
>>>On April 15, 2004 at 12:45:23, Vincent Diepeveen wrote:
>>>
>>>>On April 15, 2004 at 09:01:44, Robert Hyatt wrote:
>>>>
>>>>>On April 15, 2004 at 06:05:15, Joachim Rang wrote:
>>>>>
>>>>>>On April 14, 2004 at 22:49:39, Robert Hyatt wrote:
>>>>>>
>>>>>>>I just finished some HT on / HT off tests to see how things have changed in
>>>>>>>Crafty since some of the recent NUMA-related memory changes that were made.
>>>>>>>
>>>>>>>Point 1.  HT now speeds Crafty up between 5 and 10% max.  A year ago this was
>>>>>>>30%.  What did I learn?  Nothing new.  Memory waits benefit HT.  Eugene and I
>>>>>>>worked on removing several shared memory interactions which led to better cache
>>>>>>>utilization, less cache invalidates (very slow) and improved performance a good
>>>>>>>bit.  But at the same time, now HT doesn't have the excessive memory waits it
>>>>>>>had before and so the speedup is not as good.
>>>>>>>
>>>>>>>Point 2.  HT now actually slows things down due to SMP overhead.  IE I lose 30%
>>>>>>>per CPU, roughly, due to SMP overhead.  HT now only gives 5-10% back.  This is a
>>>>>>>net loss.  I am now running my dual with HT disabled...
>>>>>>>
>>>>>>>More as I get more data...  Here is two data points however:
>>>>>>>
>>>>>>>pos1.  cpus=2 (no HT)  NPS = 2.07M  time=18.13
>>>>>>>       cpus=4          NPS = 2.08M  time=28.76
>>>>>>>
>>>>>>>pos2.  cpus=2          NPS = 1.87M  time=58.48
>>>>>>>       cpus=4          NPS = 2.01M  time=66.00
>>>>>>>
>>>>>>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead.
>>>>>>>Ugly.  Position 2 gives about 5% more nps, but again the SMP overhead washes
>>>>>>>that out and there is a net loss.  I should run the speedup tests several times,
>>>>>>>but the NPS numbers don't change much, and the speedup could change.  But this
>>>>>>>offers enough..
>>>>>>
>>>>>>
>>>>>>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz
>>>>>>gains still 25% form HT (in this specific position)
>>>>>>
>>>>>>cpus=2    NPS = 2.35
>>>>>>cpus=4    NPS = 2,95
>>>>>>
>>>>>>I have unfortunately no information about search time.
>>>>>>
>>>>>>Does that mean Fritz 8 is poorly optimized?
>>>>>>
>>>>>>regards Joachim
>>>>>
>>>>>
>>>>>It means it has some cache issues that can be fixed to speed it up further, yes.
>>>>
>>>>Not at all.
>>>>
>>>>Fritz is p4 hand optimized assembly currently. I expect him to work hard on an
>>>>opteron hand optimized assembly version from fritz now (probably already 1 year
>>>>working at it by now).
>>>
>>>Sorry, but you should stick to topics you know something about.  SMT works best
>>
>>I guess this is your way of saying: "sorry i did not consider that it was a more
>>efficient program than crafty, and that the better SMT was caused by more hash
>>lookups than that i had taken into account could be profittable".
>>
>>>in programs where there are memory reads/writes that stall a thread.  As you
>>>work out those stalls, SMT pays off less gain.  My current numbers clearly show
>>>this as opposed to the numbers I (and others) posted when I first got my SMT
>>>box...
>>
>>You do 1 lookup to RAM. He's doing perhaps 3 lookups.
>>
>>You should do your math better before commenting on Fritz being inefficient
>>programmed.
>
>Why don't you quote _exactly_ where I said that.
>
>Then we can start the _real_ conversation.
>
>Hint:
>
>I said "Fritz has some cache issues."  That is _all_ I said.  Your hyperbole
>turned that into "inefficiently programmed" as your hyperbole always changes
>everybody's statements...

You implicitly for everybody clear suggest that it has fixable problems, just
like you 'fixed' them in crafty.

This where it is written in assembly for a P4 with 512KB cache and you had not
even thought of it that it might do more hashtable lookups than your crafty.





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.