Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Some new hyper-threading info.

Author: Robert Hyatt
Date: 19:50:24 04/16/04
On April 16, 2004 at 14:22:13, Vincent Diepeveen wrote:

>On April 16, 2004 at 12:40:21, Robert Hyatt wrote:
>
>>On April 16, 2004 at 10:05:04, Vincent Diepeveen wrote:
>>
>>>On April 15, 2004 at 13:10:26, Robert Hyatt wrote:
>>>
>>>>On April 15, 2004 at 12:45:23, Vincent Diepeveen wrote:
>>>>
>>>>>On April 15, 2004 at 09:01:44, Robert Hyatt wrote:
>>>>>
>>>>>>On April 15, 2004 at 06:05:15, Joachim Rang wrote:
>>>>>>
>>>>>>>On April 14, 2004 at 22:49:39, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>I just finished some HT on / HT off tests to see how things have changed in
>>>>>>>>Crafty since some of the recent NUMA-related memory changes that were made.
>>>>>>>>
>>>>>>>>Point 1.  HT now speeds Crafty up between 5 and 10% max.  A year ago this was
>>>>>>>>30%.  What did I learn?  Nothing new.  Memory waits benefit HT.  Eugene and I
>>>>>>>>worked on removing several shared memory interactions which led to better cache
>>>>>>>>utilization, less cache invalidates (very slow) and improved performance a good
>>>>>>>>bit.  But at the same time, now HT doesn't have the excessive memory waits it
>>>>>>>>had before and so the speedup is not as good.
>>>>>>>>
>>>>>>>>Point 2.  HT now actually slows things down due to SMP overhead.  IE I lose 30%
>>>>>>>>per CPU, roughly, due to SMP overhead.  HT now only gives 5-10% back.  This is a
>>>>>>>>net loss.  I am now running my dual with HT disabled...
>>>>>>>>
>>>>>>>>More as I get more data...  Here is two data points however:
>>>>>>>>
>>>>>>>>pos1.  cpus=2 (no HT)  NPS = 2.07M  time=18.13
>>>>>>>>       cpus=4          NPS = 2.08M  time=28.76
>>>>>>>>
>>>>>>>>pos2.  cpus=2          NPS = 1.87M  time=58.48
>>>>>>>>       cpus=4          NPS = 2.01M  time=66.00
>>>>>>>>
>>>>>>>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead.
>>>>>>>>Ugly.  Position 2 gives about 5% more nps, but again the SMP overhead washes
>>>>>>>>that out and there is a net loss.  I should run the speedup tests several times,
>>>>>>>>but the NPS numbers don't change much, and the speedup could change.  But this
>>>>>>>>offers enough..
>>>>>>>
>>>>>>>
>>>>>>>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz
>>>>>>>gains still 25% form HT (in this specific position)
>>>>>>>
>>>>>>>cpus=2    NPS = 2.35
>>>>>>>cpus=4    NPS = 2,95
>>>>>>>
>>>>>>>I have unfortunately no information about search time.
>>>>>>>
>>>>>>>Does that mean Fritz 8 is poorly optimized?
>>>>>>>
>>>>>>>regards Joachim
>>>>>>
>>>>>>
>>>>>>It means it has some cache issues that can be fixed to speed it up further, yes.
>>>>>
>>>>>Not at all.
>>>>>
>>>>>Fritz is p4 hand optimized assembly currently. I expect him to work hard on an
>>>>>opteron hand optimized assembly version from fritz now (probably already 1 year
>>>>>working at it by now).
>>>>
>>>>Sorry, but you should stick to topics you know something about.  SMT works best
>>>
>>>I guess this is your way of saying: "sorry i did not consider that it was a more
>>>efficient program than crafty, and that the better SMT was caused by more hash
>>>lookups than that i had taken into account could be profittable".
>>>
>>>>in programs where there are memory reads/writes that stall a thread.  As you
>>>>work out those stalls, SMT pays off less gain.  My current numbers clearly show
>>>>this as opposed to the numbers I (and others) posted when I first got my SMT
>>>>box...
>>>
>>>You do 1 lookup to RAM. He's doing perhaps 3 lookups.
>>>
>>>You should do your math better before commenting on Fritz being inefficient
>>>programmed.
>>
>>Why don't you quote _exactly_ where I said that.
>>
>>Then we can start the _real_ conversation.
>>
>>Hint:
>>
>>I said "Fritz has some cache issues."  That is _all_ I said.  Your hyperbole
>>turned that into "inefficiently programmed" as your hyperbole always changes
>>everybody's statements...
>
>You implicitly for everybody clear suggest that it has fixable problems, just
>like you 'fixed' them in crafty.

So?

I've written my fair share of asm stuff over the years.  _each_ new
architectural change introduces new performance considerations.  Doesn't mean
the old program was "inefficiently programmed."  Just means it is not yet
optimized for the _new_ platform...

>
>This where it is written in assembly for a P4 with 512KB cache and you had not
>even thought of it that it might do more hashtable lookups than your crafty.

Again, so?  What does random information have to do with anything here?>??
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.