Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Robert Hyatt

Date: 06:06:19 04/15/04

On April 15, 2004 at 05:57:31, Joachim Rang wrote:

>On April 14, 2004 at 22:49:39, Robert Hyatt wrote:
>
>>I just finished some HT on / HT off tests to see how things have changed in
>>Crafty since some of the recent NUMA-related memory changes that were made.
>>
>>Point 1.  HT now speeds Crafty up between 5 and 10% max.  A year ago this was
>>30%.  What did I learn?  Nothing new.  Memory waits benefit HT.  Eugene and I
>>worked on removing several shared memory interactions which led to better cache
>>utilization, less cache invalidates (very slow) and improved performance a good
>>bit.  But at the same time, now HT doesn't have the excessive memory waits it
>>had before and so the speedup is not as good.
>>
>>Point 2.  HT now actually slows things down due to SMP overhead.  IE I lose 30%
>>per CPU, roughly, due to SMP overhead.  HT now only gives 5-10% back.  This is a
>>net loss.  I am now running my dual with HT disabled...
>>
>>More as I get more data...  Here is two data points however:
>>
>>pos1.  cpus=2 (no HT)  NPS = 2.07M  time=18.13
>>       cpus=4          NPS = 2.08M  time=28.76
>>
>>pos2.  cpus=2          NPS = 1.87M  time=58.48
>>       cpus=4          NPS = 2.01M  time=66.00
>>
>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead.
>>Ugly.  Position 2 gives about 5% more nps, but again the SMP overhead washes
>>that out and there is a net loss.  I should run the speedup tests several times,
>>but the NPS numbers don't change much, and the speedup could change.  But this
>>offers enough..
>
>
>thanks Bob for sharing that information. Does that mean a poorly optimized
>engien should gain more from HT, perhaps even to the point that it pays off?
>
>Can you give an estimate of crafty on a quad- and 8-way-system in gain of nps
>and search time?
>
>thanks in advance
>
>regards Joachim

Think about this:

In an operating system, we gain by running two processes at the same time only
when both do some I/O so that there is overlap.  No I/O and things run no
faster.  Ditto for SMT.  The more one thread blocks waiting on memory, the more
advantage you see with SMT since a second thread can eat the CPU cycles that
would normally be lost waiting on memory loads/stores.

on an 8-way the NPS is about 8x.  This on an 8-way opteron.  Speedup numbers
were as I have posted in the past.  IE about 6x or a little less...

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.