Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: But, Re: Questions re P4 3.03 with HT ??

Author: Matt Taylor

Date: 13:43:29 12/10/02

Go up one level in this thread


On December 10, 2002 at 16:35:11, Robert Hyatt wrote:

>On December 10, 2002 at 14:31:51, Matt Taylor wrote:
>
>>On December 10, 2002 at 13:18:45, Robert Hyatt wrote:
>>
>>>On December 10, 2002 at 12:31:46, Matt Taylor wrote:
>>>
>>>>On December 10, 2002 at 12:21:33, Robert Hyatt wrote:
>>>>
>>>>>On December 10, 2002 at 11:34:45, Jeremiah Penery wrote:
>>>>>
>>>>>>On December 10, 2002 at 10:57:40, Robert Hyatt wrote:
>>>>>>
>>>>>>>On December 10, 2002 at 09:08:10, Vincent Diepeveen wrote:
>>>>>>>
>>>>>>>>Matt i don't know it for crafty or other crap products. Crafty as we
>>>>>>>>see in test needs less nodes when running MT=2,
>>>>>>>
>>>>>>>I realize this is hard for you to do, but is it _possible_ that you can stick
>>>>>>>to _real_ data when you post?  The above is _absolute_ crap.  Crafty does
>>>>>>>_not_ "need less nodes when MT=2".  In some positions, yes, but in
>>>>>>>more positions it needs _more_.  And for the average case it needs _more_.
>>>>>>>
>>>>>>>I don't know why you continue to post something that any person here can
>>>>>>>refute simply by running the code.  I've done it for you many times.  The
>>>>>>>above is false.  Please find something _else_ to wave your hands about.
>>>>>>
>>>>>>It came from the original data in this thread:
>>>>>
>>>>>So?  That is over 6 positions.  Using that to prove that a program searches
>>>>>"fewer
>>>>>nodes with mt=2" is total nonsense, as is the claim that a program +will+ search
>>>>>fewer nodes overall using two threads.  It simply doesn't happen.  And it falls
>>>>>in
>>>>>the same class as the perpetual-motion machine...  It doesn't work...
>>>>
>>>>I like Cold Fusion a little better.
>>>
>>>I'm not going that far.  There is always a remote possibility that something
>>>like that
>>>might be possible given the right materials and conditions.  Perpetual motion is
>>>another
>>>thing entirely, as is a speedup > 2.0 with two processors.  :)
>>
>>Yeah. I like the Cold Fusion example because the data does not justify the
>>claim. But yeah, it is difficult to see how a second processor would possibly
>>create a speed-up of more than a factor of 2. Obviously if that (legitimately)
>>happens, more than just the number of CPUs has changed.
>>
>>>>>>Crafty v18.15
>>>>>>White(1): bench
>>>>>>Running benchmark. . .
>>>>>>......
>>>>>>Total nodes: 97487547
>>>>>>Raw nodes per second: 1160566
>>>>>>Total elapsed time: 84
>>>>>>SMP time-to-ply measurement: 7.619048
>>>>>>White(1):
>>>>>>-------------------------------------
>>>>>>Crafty v18.15 (2 cpus)
>>>>>>White(1): bench
>>>>>>Running benchmark. . .
>>>>>>......
>>>>>>Total nodes: 94658095
>>>>>>Raw nodes per second: 1314695
>>>>>>Total elapsed time: 72
>>>>>>SMP time-to-ply measurement: 8.888889
>>>>>>
>>>>>>
>>>>>>>What is "a buggy crafty?"  And what is the 13-16%?  I posted _real_ data.  You
>>>>>>>post fantasy without even having access to a box?  And that is fact???
>>>>>>
>>>>>>You can see also that the NPS speedup in that above data is 13%.
>>>>>
>>>>>For _one_ test...  With a version of the program that has a _known_ problem with
>>>>>SMT.
>>>>
>>>>You mean the pause issue, or is there more than just that?
>>>>
>>>>-Matt
>>>
>>>Yes....  but not just in the Lock() code... there is a critical spin-wait that
>>>needs a pause
>>>otherwise one thread will be running in a spin-wait while the other thread is
>>>waiting
>>>to get scheduled and _it_ is the one that will give the "spinner" something to
>>>work on.  :)
>>
>>Ah. I'm interested in seeing the results, but I'm not expecting a huge gain from
>>using pause. If one thread is beating on the lock, it leaves the majority of the
>>execution resources and bandwidth for the other logical thread. I don't think
>>that reducing the polling rate of the L1 cache will affect results much.
>>
>>I guess the only thing we can say right now is, "We will see!"
>>
>>-Matt
>
>
>Think about it for a minute.  You have two processes to schedule.  One is doing
>something
>useful, the other is busy spinning. So every chance the "spinner" gets, it
>executes full-speed
>ahead.  And while it is executing, the _other_ thread is sitting.  The CPU has a
>50% chance
>of choosing the _wrong_ thread when one is computing doing useful work and the
>other is
>spinning doing nothing but waiting on something to do...
>
>and that is what pause helps with, the "spinner" makes one pass thru the spin
>loop and
>then says "run the other thread now"...

That's true for a scheduler on a single processor, but that's not how
Hyperthreading works as I understand it. Then again, it is possible that the
docs I read are wrong. (The last thing I read about HT was over 2 years ago.)

They said that HT allows -concurrent- scheduling of threads, but the threads
obviously cannot make use of the same execution resources. If this is correct,
one thread would be spinning (consuming bandwidth to the L1 cache) while the
other thread was doing real work.

For now I'm going to stick to what I have read. I'll poke around sometime later
this week and see if I can find any updated material on the inner workings of
HT.

-Matt



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.