Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Here are some actual numbers

Author: Tom Kerrigan

Date: 22:37:08 04/15/03

Go up one level in this thread


On April 16, 2003 at 00:01:34, Robert Hyatt wrote:

>On April 15, 2003 at 14:56:38, Tom Kerrigan wrote:
>
>>On April 14, 2003 at 23:41:10, Robert Hyatt wrote:
>>
>>>On April 14, 2003 at 18:19:28, Ricardo Gibert wrote:
>>>
>>>>On April 14, 2003 at 17:54:14, Robert Hyatt wrote:
>>>>
>>>>>On April 14, 2003 at 15:50:22, Tom Kerrigan wrote:
>>>>>
>>>>>>On April 13, 2003 at 11:21:51, Robert Hyatt wrote:
>>>>>>
>>>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote:
>>>>>>>
>>>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>It _is_ pinned on SMT.  The two logical processors are producing wildly
>>>>>>>>>imbalanced results when using threads, vs using two separate processes.  It
>>>>>>>>>would appear to be cache-related...
>>>>>>>>
>>>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you
>>>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT?
>>>>>>>.
>>>>>>>
>>>>>>>
>>>>>>>The _variability_ is with SMT.  What are you talking about?  I reported _two_
>>>>>>>issues.
>>>>>>>
>>>>>>>1.  My dual xeon runs two copies of crafty about 2x as fast as if they were
>>>>>>>run one after the other.  So does my quad 700.
>>>>>>>
>>>>>>>2.  My dual xeon runs one copy, two threads, at about 1.5X the speed that it
>>>>>>>should.
>>>>>>>
>>>>>>>That is a problem.
>>>>>>>
>>>>>>>The second issue is that my dual xeon does _not_ run threaded crafty in a
>>>>>>>balanced way on two logical processors.   For two independent copies, it
>>>>>>>varies from 50-50 to 45-55.  Not unreasonable.  But for the single threaded
>>>>>>>copy, it varies all the way to 70-30.  _that_ is an SMT issue.  Probably, as
>>>>>>>I mentioned, caused by some unknown L2 cache issue.  But it _is_ a problem
>>>>>>>with SMT if you want to assume that normally it is about 50-50 roughly, for
>>>>>>>_regular_ applications.
>>>>>>>
>>>>>>>shared memory, locks, etc are causing something strange to happen.
>>>>>>
>>>>>>It looks like you're having enough problems and unexplained behavior already
>>>>>>that it's hard to trust any sort of numbers you post. But still, if the widest
>>>>>>disparity you measured was 70-30, that seems like enough to dispel your notion
>>>>>>that one thread always gets priority over the other.
>>>>>>
>>>>>>-Tom
>>>>>
>>>>>
>>>>>How?
>>>>>
>>>>>70-30 is > 2:1.
>>>>>
>>>>>Something is going on.
>>>>
>>>>
>>>>If the worst you could do by flipping a coin 1 million times is to get heads 70%
>>>>of the time, one should conclude the coin is unbiased? I don't think so. You're
>>>>right to think 70-30 is a significant result. There is some asymmetry (a bug?)
>>>>going on where none is expected.
>>>
>>>My original idea was that somewhere along the way, you _must_ make a decision
>>>about which of two things to do next.  Flipping a bit complicates the process
>>>if it has to be done many times.  The old Cray did it by using the processor
>>>ID to break ties.  It is possible that somewhere in the PIV core, there is
>>>a tie-break that is not 50-50.  It is also possible that the  results I am
>>>getting are somehow wrong...
>>
>>"Tie breaking" is not the issue. If you read the thing Anthony posted, you'd
>>know that all the P4's resources are evenly divided: instruction window, reorder
>>registers, reorder buffer, load/store queues, everything. How can you have
>>unbalanced execution when each thread gets half of everything? I don't think you
>>could if you wanted.
>>
>>When I suggested that you and Vincent had the same problem, with unbalanced
>>processing, you bitched that you had really reported two problems. Well, that's
>>not so clear to me. If your dual proc machine is searching 1.5x the NPS that it
>>should, is it unreasonable to think that maybe one of the processors is somehow
>>idle (spinning) half the time? Because if that's the case, running both threads
>>on a HT processor could easily result in a 66-33 disparity in NPS per thread,
>>which is pretty damn close to what you're seeing.
>>
>>Occam's razor, Bob.
>>
>>-Tom
>
>
>But the principle doesn't apply here.  Why?  because I _know_ that one thread
>is not "spinning" or "waiting" whatsoever.  I _carefully_ account for every
>spin while waiting on work.  It averages about 5% out of 400% on a 3 minute
>search.
>
>There is more going on than meets the eye, from several perspectives.

Sure. What's more likely?

1. Your code that accounts for spins is not working right for some reason.
2. Your splitting algorithm sends all the slow nodes to the 2nd thread.
3. Your 2nd processor underclocks itself by 1/2 when it detects that it's
running a Crafty thread.
4. You really are searching those nodes, but due to a bug from overclocking, the
ALU doesn't increment the node counter.
5.

-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.