Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Here are some actual numbers

Author: Robert Hyatt
Date: 09:13:35 04/16/03
On April 16, 2003 at 01:37:08, Tom Kerrigan wrote:

>On April 16, 2003 at 00:01:34, Robert Hyatt wrote:
>
>>On April 15, 2003 at 14:56:38, Tom Kerrigan wrote:
>>
>>>On April 14, 2003 at 23:41:10, Robert Hyatt wrote:
>>>
>>>>On April 14, 2003 at 18:19:28, Ricardo Gibert wrote:
>>>>
>>>>>On April 14, 2003 at 17:54:14, Robert Hyatt wrote:
>>>>>
>>>>>>On April 14, 2003 at 15:50:22, Tom Kerrigan wrote:
>>>>>>
>>>>>>>On April 13, 2003 at 11:21:51, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote:
>>>>>>>>
>>>>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote:
>>>>>>>>>
>>>>>>>>>>It _is_ pinned on SMT.  The two logical processors are producing wildly
>>>>>>>>>>imbalanced results when using threads, vs using two separate processes.  It
>>>>>>>>>>would appear to be cache-related...
>>>>>>>>>
>>>>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you
>>>>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT?
>>>>>>>>.
>>>>>>>>
>>>>>>>>
>>>>>>>>The _variability_ is with SMT.  What are you talking about?  I reported _two_
>>>>>>>>issues.
>>>>>>>>
>>>>>>>>1.  My dual xeon runs two copies of crafty about 2x as fast as if they were
>>>>>>>>run one after the other.  So does my quad 700.
>>>>>>>>
>>>>>>>>2.  My dual xeon runs one copy, two threads, at about 1.5X the speed that it
>>>>>>>>should.
>>>>>>>>
>>>>>>>>That is a problem.
>>>>>>>>
>>>>>>>>The second issue is that my dual xeon does _not_ run threaded crafty in a
>>>>>>>>balanced way on two logical processors.   For two independent copies, it
>>>>>>>>varies from 50-50 to 45-55.  Not unreasonable.  But for the single threaded
>>>>>>>>copy, it varies all the way to 70-30.  _that_ is an SMT issue.  Probably, as
>>>>>>>>I mentioned, caused by some unknown L2 cache issue.  But it _is_ a problem
>>>>>>>>with SMT if you want to assume that normally it is about 50-50 roughly, for
>>>>>>>>_regular_ applications.
>>>>>>>>
>>>>>>>>shared memory, locks, etc are causing something strange to happen.
>>>>>>>
>>>>>>>It looks like you're having enough problems and unexplained behavior already
>>>>>>>that it's hard to trust any sort of numbers you post. But still, if the widest
>>>>>>>disparity you measured was 70-30, that seems like enough to dispel your notion
>>>>>>>that one thread always gets priority over the other.
>>>>>>>
>>>>>>>-Tom
>>>>>>
>>>>>>
>>>>>>How?
>>>>>>
>>>>>>70-30 is > 2:1.
>>>>>>
>>>>>>Something is going on.
>>>>>
>>>>>
>>>>>If the worst you could do by flipping a coin 1 million times is to get heads 70%
>>>>>of the time, one should conclude the coin is unbiased? I don't think so. You're
>>>>>right to think 70-30 is a significant result. There is some asymmetry (a bug?)
>>>>>going on where none is expected.
>>>>
>>>>My original idea was that somewhere along the way, you _must_ make a decision
>>>>about which of two things to do next.  Flipping a bit complicates the process
>>>>if it has to be done many times.  The old Cray did it by using the processor
>>>>ID to break ties.  It is possible that somewhere in the PIV core, there is
>>>>a tie-break that is not 50-50.  It is also possible that the  results I am
>>>>getting are somehow wrong...
>>>
>>>"Tie breaking" is not the issue. If you read the thing Anthony posted, you'd
>>>know that all the P4's resources are evenly divided: instruction window, reorder
>>>registers, reorder buffer, load/store queues, everything. How can you have
>>>unbalanced execution when each thread gets half of everything? I don't think you
>>>could if you wanted.
>>>
>>>When I suggested that you and Vincent had the same problem, with unbalanced
>>>processing, you bitched that you had really reported two problems. Well, that's
>>>not so clear to me. If your dual proc machine is searching 1.5x the NPS that it
>>>should, is it unreasonable to think that maybe one of the processors is somehow
>>>idle (spinning) half the time? Because if that's the case, running both threads
>>>on a HT processor could easily result in a 66-33 disparity in NPS per thread,
>>>which is pretty damn close to what you're seeing.
>>>
>>>Occam's razor, Bob.
>>>
>>>-Tom
>>
>>
>>But the principle doesn't apply here.  Why?  because I _know_ that one thread
>>is not "spinning" or "waiting" whatsoever.  I _carefully_ account for every
>>spin while waiting on work.  It averages about 5% out of 400% on a 3 minute
>>search.
>>
>>There is more going on than meets the eye, from several perspectives.
>
>Sure. What's more likely?
>
>1. Your code that accounts for spins is not working right for some reason.

Probability zero.  Because it works on _every_ other machine here.


>2. Your splitting algorithm sends all the slow nodes to the 2nd thread.

I don't see any "slow" nodes so I don't know what that means...


>3. Your 2nd processor underclocks itself by 1/2 when it detects that it's
>running a Crafty thread.

Or cosmic rays...

>4. You really are searching those nodes, but due to a bug from overclocking, the
>ALU doesn't increment the node counter.
>5.
>

Or, most likely, by a 100:1 probability is "none of the above."


>-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.