Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Magic 200MHz

Author: Tom Kerrigan

Date: 23:21:43 05/26/03

Go up one level in this thread


On May 27, 2003 at 00:19:49, Robert Hyatt wrote:

>On May 26, 2003 at 15:24:02, Tom Kerrigan wrote:
>
>>On May 26, 2003 at 13:54:44, Robert Hyatt wrote:
>>>On May 24, 2003 at 20:04:28, Tom Kerrigan wrote:
>>>>Second, you're right, I didn't think it out. Since both logical processors use
>>>>the same caches, there IS no cache coherency problem. In fact, cache coherency
>>>>is completely unrelated to hyperthreading, isn't it? I don't even know why you
>>>>would use the two terms in the same sentence. But here you are, using "cache
>>>>coherency" as an explanation for one logical processor running a thread faster
>>>>than the other. ("Well-thought-out hunch," sure.)
>>>
>>>I have talked about two problems.  One is the "unbalanced" hyperthreading,
>>>when running Crafty.  And I carefully explained that I can run two threads
>>>with SMT off, or four with SMT on.  I can't run a one and two thread test as
>>>my machine has two processors and one can't be removed without a terminator
>>>that I don't have.
>>>
>>>Therefore, the cache coherency issue seems to be important in that it is hurting
>>>performance with two physical processors.  It seems to be the only viable (at
>>>the moment) explanation for the unbalanced SMT performance as well.
>>
>>Fine. What was your point again? That HT's design favors one logical processor
>>over the other? Because the cache coherency system _might_? Okay, I give up. One
>>_small_ aspect of HT _may_ favor a specific logical processor, based on one
>>experiment with one program that hasn't been independently reproduced. Man, you
>>sure won that argument. (I especially like all the handwaving about P3 cache
>>line sizes even though the discussion was about HT.)
>
>There is no "hand waving".
>
>I mentioned cache as a possible explanation of why my PIII systems run
>threaded crafty much more efficiently than PIV systems run it.  I have looked
...
>But since it _is_ a potential problem, and since my program is also seeing a
>very odd SMT balance between logical cpus, it is a potential issue there as
>well.

A likely cause for one problem (unrelated to HT) is also a likely cause for a HT
problem? Brillant. You must be Sherlock Holmes reincarnated.

>>>And again, so what?  Who does a "halt"?  Windows .net server _might_.  But
>>>no others I have tested...
>>Any version of Windows NT and Linux. Really, Bob, just search for "halt
>>instruction windows" in Google.
>I don't need to.  I have this really nasty habit of fiddling with the linux
>kernel source frequently.  I _know_ what it does.

So how do you explain your statement that no OSs you've "tested" issue halts? I
mean, Linux issues halts. Did you not "test" Linux?

>>Besides, how do you explain HT processors (with HT enabled) running single
>>threaded programs at full speed, as they do in all online hardware reviews? If
>>the operating systems aren't issuing HALT instructions (as you contend), that
>>single thread is only getting half the chip's resources. Doesn't seem likely
>>that it would run at full speed with half the resources, does it?
>
>First, your statement is wrong.  There have been reports that running a
>single thread with SMT on runs slower than a single thread with SMT off.
>
>That was where this discussion started on RC5 in fact.

Really... what post was that? Because I can only find posts saying that RC5
slows down with two threads, not one.

>As _I_ have repeatedly said, I have _never_ seen a case where a single
>thread runs slower with SMT on.  That was my original claim.  I have yet
>to see anything _different_ anywhere.
>
>So I don't quite see what your point is unless it is to reinforce _my_ point
>about SMT not slowing things down in any way I can see...  Unless we talk about
>the case of running two threads using two logical cpus being slower than running
>one thread on one real cpu.  I can see where _that_ could cause speed issues in
>lots of ways, particularly with a parallel search.

Hmm, that's interesting, because you couldn't understand that a few days ago.

"Could it be slower in some?  Of course.  But then the algorithm(s) in question
need work, obviously..."

http://www.talkchess.com/forums/1/message.html?297442

I mean, really, my first post to this thread was to explain how a multithreaded
benchmark could run slower than a single threaded benchmark, assuming no
algorithm inefficiencies.

If you agreed with me then, as you seem to now, then why have we been arguing
about this for days?

>>"I have concluded that _most_ of the resources are dynamically allocated between
>>the two logical processors 'as needed'."
>>
>>So if you don't think that those buffers don't constitute "most" of the OOOE
>>resources, then how about you name which huge, important buffers are dynamically
>>allocated? Please, Bob, grace us with your infinite hyperthreaded wisdom.
>
>The _critical_ resources are the "pipes" that execute micro-ops, and the
>register rename pool (not the rename tables) that holds enough data to keep
>things busy.  Followed by memory read/write buffers...

The "pipes," as you so eloquently call them, involve all the buffers that Intel
says are split, and the memory read/write buffers are also split. The only thing
that may be duplicated, instead of split, is the rename register file. And
really, if you're talking about how balanced the processors are, duplicated
might as well be the same as split.

-Tom



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.