Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: But, Re: Questions re P4 3.03 with HT ??

Author: Vincent Diepeveen

Date: 16:45:44 12/10/02

Go up one level in this thread


On December 10, 2002 at 16:43:29, Matt Taylor wrote:

bob is claiming the wrong thing towards you.

chance a process is spinning is very small with 2 processors.

if not then at a dual machine bob's thing would get a very bad
speedup like 0.9 at 2 cpu's or so instead of the claimed 1.7
speedup (though my own data shows 1.6 and bob's data is not
accurate as he says himself because he only ran the test once
at his machine so every datapoint can be far off; just like
that his 5 positions that show here that he gets a positive
speedup, without doing a counter example, is proving a speedup
and 6 tests that proof the opposite is not accurate enough
info).

Still understanding what i write?

No it's just Hyatt who tries to ignore the truth here.

truth is that in this world the only big speedups with SMT
is reported by wintel and intel guys.

Bob is one of them, that's my claim.

Bob is using machines from nalimov which are in the wintel
labs, to proof SMT works.

This where the intel documentation clearly shows that if you
buy in a shop a 2.8Ghz Xeon, that it doesn't have working SMT
at all.

instead only 3.0Ghz Xeons have it and Xeon MPs.

I quoted a week ago here the exact reference to the PDF of the
intel architecture guide.

Best regards,
Vincent

>On December 10, 2002 at 16:35:11, Robert Hyatt wrote:
>
>>On December 10, 2002 at 14:31:51, Matt Taylor wrote:
>>
>>>On December 10, 2002 at 13:18:45, Robert Hyatt wrote:
>>>
>>>>On December 10, 2002 at 12:31:46, Matt Taylor wrote:
>>>>
>>>>>On December 10, 2002 at 12:21:33, Robert Hyatt wrote:
>>>>>
>>>>>>On December 10, 2002 at 11:34:45, Jeremiah Penery wrote:
>>>>>>
>>>>>>>On December 10, 2002 at 10:57:40, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On December 10, 2002 at 09:08:10, Vincent Diepeveen wrote:
>>>>>>>>
>>>>>>>>>Matt i don't know it for crafty or other crap products. Crafty as we
>>>>>>>>>see in test needs less nodes when running MT=2,
>>>>>>>>
>>>>>>>>I realize this is hard for you to do, but is it _possible_ that you can stick
>>>>>>>>to _real_ data when you post?  The above is _absolute_ crap.  Crafty does
>>>>>>>>_not_ "need less nodes when MT=2".  In some positions, yes, but in
>>>>>>>>more positions it needs _more_.  And for the average case it needs _more_.
>>>>>>>>
>>>>>>>>I don't know why you continue to post something that any person here can
>>>>>>>>refute simply by running the code.  I've done it for you many times.  The
>>>>>>>>above is false.  Please find something _else_ to wave your hands about.
>>>>>>>
>>>>>>>It came from the original data in this thread:
>>>>>>
>>>>>>So?  That is over 6 positions.  Using that to prove that a program searches
>>>>>>"fewer
>>>>>>nodes with mt=2" is total nonsense, as is the claim that a program +will+ search
>>>>>>fewer nodes overall using two threads.  It simply doesn't happen.  And it falls
>>>>>>in
>>>>>>the same class as the perpetual-motion machine...  It doesn't work...
>>>>>
>>>>>I like Cold Fusion a little better.
>>>>
>>>>I'm not going that far.  There is always a remote possibility that something
>>>>like that
>>>>might be possible given the right materials and conditions.  Perpetual motion is
>>>>another
>>>>thing entirely, as is a speedup > 2.0 with two processors.  :)
>>>
>>>Yeah. I like the Cold Fusion example because the data does not justify the
>>>claim. But yeah, it is difficult to see how a second processor would possibly
>>>create a speed-up of more than a factor of 2. Obviously if that (legitimately)
>>>happens, more than just the number of CPUs has changed.
>>>
>>>>>>>Crafty v18.15
>>>>>>>White(1): bench
>>>>>>>Running benchmark. . .
>>>>>>>......
>>>>>>>Total nodes: 97487547
>>>>>>>Raw nodes per second: 1160566
>>>>>>>Total elapsed time: 84
>>>>>>>SMP time-to-ply measurement: 7.619048
>>>>>>>White(1):
>>>>>>>-------------------------------------
>>>>>>>Crafty v18.15 (2 cpus)
>>>>>>>White(1): bench
>>>>>>>Running benchmark. . .
>>>>>>>......
>>>>>>>Total nodes: 94658095
>>>>>>>Raw nodes per second: 1314695
>>>>>>>Total elapsed time: 72
>>>>>>>SMP time-to-ply measurement: 8.888889
>>>>>>>
>>>>>>>
>>>>>>>>What is "a buggy crafty?"  And what is the 13-16%?  I posted _real_ data.  You
>>>>>>>>post fantasy without even having access to a box?  And that is fact???
>>>>>>>
>>>>>>>You can see also that the NPS speedup in that above data is 13%.
>>>>>>
>>>>>>For _one_ test...  With a version of the program that has a _known_ problem with
>>>>>>SMT.
>>>>>
>>>>>You mean the pause issue, or is there more than just that?
>>>>>
>>>>>-Matt
>>>>
>>>>Yes....  but not just in the Lock() code... there is a critical spin-wait that
>>>>needs a pause
>>>>otherwise one thread will be running in a spin-wait while the other thread is
>>>>waiting
>>>>to get scheduled and _it_ is the one that will give the "spinner" something to
>>>>work on.  :)
>>>
>>>Ah. I'm interested in seeing the results, but I'm not expecting a huge gain from
>>>using pause. If one thread is beating on the lock, it leaves the majority of the
>>>execution resources and bandwidth for the other logical thread. I don't think
>>>that reducing the polling rate of the L1 cache will affect results much.
>>>
>>>I guess the only thing we can say right now is, "We will see!"
>>>
>>>-Matt
>>
>>
>>Think about it for a minute.  You have two processes to schedule.  One is doing
>>something
>>useful, the other is busy spinning. So every chance the "spinner" gets, it
>>executes full-speed
>>ahead.  And while it is executing, the _other_ thread is sitting.  The CPU has a
>>50% chance
>>of choosing the _wrong_ thread when one is computing doing useful work and the
>>other is
>>spinning doing nothing but waiting on something to do...
>>
>>and that is what pause helps with, the "spinner" makes one pass thru the spin
>>loop and
>>then says "run the other thread now"...
>
>That's true for a scheduler on a single processor, but that's not how
>Hyperthreading works as I understand it. Then again, it is possible that the
>docs I read are wrong. (The last thing I read about HT was over 2 years ago.)
>
>They said that HT allows -concurrent- scheduling of threads, but the threads
>obviously cannot make use of the same execution resources. If this is correct,
>one thread would be spinning (consuming bandwidth to the L1 cache) while the
>other thread was doing real work.
>
>For now I'm going to stick to what I have read. I'll poke around sometime later
>this week and see if I can find any updated material on the inner workings of
>HT.
>
>-Matt



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.