Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Odd hyperthreading behavior

Author: Robert Hyatt

Date: 06:43:55 10/06/03

Go up one level in this thread


On October 06, 2003 at 06:03:21, Tom Kerrigan wrote:

>On October 04, 2003 at 23:42:01, Robert Hyatt wrote:
>
>>On October 04, 2003 at 21:00:34, Tom Kerrigan wrote:
>>
>>>I had the chance to run my program on a dual P4 Xeon (with hyperthreading).
>>>
>>>First off, there have been some involved arguments about the design and
>>>performance of hyperthreading on this board in the past. I'd like to settle one
>>>argument, namely that single threaded programs do not slow down when
>>>hyperthreading is on. Actually, my program did slow down by 1.3% but I think
>>>this is marginal and easily attributed to the scheduler, not hyperthreading.
>>>
>>>The odd part is that hyperthreading DOES slow down my program when running 2
>>>threads. With HT off, my program searches 90% more NPS with a 2nd thread. With
>>>HT on, it only searches 53% more NPS. The idle time reported by each thread is
>>>low and the nodes are split evenly, so it seems both processors are slowed down
>>>equally. What must be happening is that HT is activated some (or all?) of the
>>>time while searching but I have no idea what might be activating it.
>>
>>
>>Your explanation is not very clear.  You have a dual.  Did you run two
>>threads with HT on?  Which means that the two threads might run on two
>
>With it on and off. I think I made that pretty clear in my post.

Yes, but you didn't quite make it clear about how you ran the test.  IE I
_think_ you ran two threads on a machine with SMT off, then two threads on
a machine with SMT on.  That invites the problem I have discussed here many
times, that no current O/S (except for a windows .net kernel or a linux
kernel that is specifically patched to address this) understands that two
threads need to run on two physical processors if possible.  When you run
two threads on a dual with SMT on, one thread goes on one physical processor
by definition.  The O/S now has three remaining processors, two of which are
on a different physical processor from the first thread.  You have a 2/3
chance of hitting two different physical processors, and a 1/3 chance of
hitting two logical processors on the same physical processor.  Your node
count will vary wildly.

Ingo Molnar discussed this problem a good while back and he (perhaps with
someone else) wrote scheduler patches to fix the problem.  Someone ran a bunch
of tests and found that the patched kernel totally solved the problem since
the problem only occurs with two threads on a dual with SMT on, and the patch
guarantees that if there are only two runnable processes, they run on separate
physical processors all the time.

I have used this kernel patch myself.

>
>>different physical processors or two logical processors on the same
>>physical CPU.
>>
>>process schedulers are not yet doing this right.  There are patches for linux
>>that I have tried and which work, but current stable kernels do not handle SMT
>>correctly when you have two physical processors, four logical processors, and
>>you run two threads.  It should run two threads on two different physical
>>processors but current schedulers don't do this, linux or windows.
>
>I can understand how you can say this with certainty about Linux, but
>Windows...?


Microsoft has said this.  "Windows .net correctly handles SMT ..."
I called tech support as we have several dual xeons running windows 2000
and windows XP pro.  They said that we had to wait for .net to solve the
problem.



>
>>>Also odd is that HT seems to be decreasing the efficiency of the search. With HT
>>>off, my program's time-to-ply is 64% faster with 2 threads but with HT on, it's
>>>only 21% faster. The time-to-ply:NPS ratios are 0.86 and 0.79 respectively.
>>>
>>>Running 4 threads with HT on results in a 15% NPS/6% time-to-ply speedup over 2
>>>threads.
>>>
>>>In other words, there's no contest between running 2 threads (HT off) vs.
>>>running 4 threads (HT on). The former wins hands down for my program.
>>>
>>>-Tom
>>
>>
>>That's different from my results.  4 threads SMT on is 20-30% faster for me.
>
>I imagine Crafty is more memory intensive than my program, with the bitboards.
>That gives more opportunities for HT...
>
>-Tom


Very possibly.  I was just pointing out that the difference is significant,
and that it varies from program to program, as we have seen reported previously.





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.