Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: OT: P4- 3 GHz with hyper-threading

Author: Eugene Nalimov

Date: 11:22:10 11/01/02

Go up one level in this thread


Sorry, after I wrote previous message, I remembered about parallel
non-determinism, so I re-run the tests 5 times on the same system and calculated
average.

mt=2: times are 42, 43, 39, 40, 38, average is 40.
mt=2: times are 36, 31, 37, 29, 38, average is 34.

So average speedup is even greater than I estimated earlier, something like
~15%.

Thanks,
Eugene

On November 01, 2002 at 13:58:06, Eugene Nalimov wrote:

>I cannot produce the test you are demanding, as I don't have physical access to
>the system on which I run the test, but here are my results.
>
>Dual P4/2.4GHz, hyperthreating turned on, Windows XP Professional.
>Unmodified Crafty 19.0 (i.e. with "bad" spinlock loop).
>"Bench" results (executable restarted after each test).
>
>mt=1:   976knps, 57 seconds
>mt=2: 1,705knps, 38 seconds
>mt=4: 2,006knps, 35 seconds
>
>I.e. there is not only ~17% raw nps speedup, but *absolute time* is also ~8%
>smaller.
>
>And that is for the executable that is non-hyperthread aware, i.e. contains bad
>spinlock loop.
>
>I tested exactly the executable that is on Bob's FTP site. You can download it
>yourself.
>
>Thanks,
>Eugene
>
>On November 01, 2002 at 13:06:53, Vincent Diepeveen wrote:
>
>>On November 01, 2002 at 12:20:14, Robert Hyatt wrote:
>>
>>Feel free to ship a version of crafty that doesn't do spinlock
>>or whatever you want to modify. I'll extensively test it for you
>>at all P4s i can get my hands on...
>>
>>I would be really amazed if you get even 0.1% faster in nodes a
>>second...
>>
>>...of course it must be a fair compare in contradiction to what
>>intel shows. They do next comparision
>>
>>  a) some feature called 'SMT' in the bios turned on
>>     - just running 2 threads then
>>  b) turning it off
>>     - also running 2 threads at it
>>
>>Like everyone who is not so naive we know that you also need
>>to do next test:
>>
>>  a) some feature called 'SMT' in the bios turned on
>>     - just running 1 thread eating all system time
>>  b) turning it off
>>     - also running 1 thread eating all system time
>>
>>There shouldn't be a speed difference between a and b of course.
>>
>>That verification step is missing.
>>
>>
>>
>>>On November 01, 2002 at 11:56:56, Vincent Diepeveen wrote:
>>>
>>>>On November 01, 2002 at 10:41:25, Robert Hyatt wrote:
>>>>
>>>>>On October 31, 2002 at 10:53:07, Vincent Diepeveen wrote:
>>>>>
>>>>>>On October 30, 2002 at 06:59:21, Terje Vagle wrote:
>>>>>>
>>>>>>>Hi all,
>>>>>>>
>>>>>>>The new cpu from intel will have a new function called
>>>>>>>hyper-threading.
>>>>>>>
>>>>>>>This will make the operating system able to recognize the cpu as if it was
>>>>>>>2 cpu's.
>>>>>>>
>>>>>>>Could the programs with smp-support make use of this?
>>>>>>>
>>>>>>>Regards,
>>>>>>>
>>>>>>>Terje Vagle
>>>>>>
>>>>>>No chessprograms cannot make use of that feature at all. It is sad but
>>>>>>the truth. Hyperthreading is a cool thing for the future but the P4
>>>>>>processor is a too small processor to allow hyperthreading from getting
>>>>>>to work.
>>>>>>
>>>>>>Apart from that a major problem is that even if we have a great processor
>>>>>>which really allows hyperthreading to be effective, that the threads
>>>>>>run at unequal speeds.
>>>>>>
>>>>>>Hyper threading is supposed to work for 2 threads where 1 is a fast
>>>>>>thread and the other is some kind of background thread eating little cpu
>>>>>>time.
>>>>>>
>>>>>>In chessprograms having a second search thread which just runs now and
>>>>>>then in the background is simply impossible to use.
>>>>>
>>>>>
>>>>>It is not impossible at all.  The only problem was spinlocks and Eugene
>>>>>posted a link to an Intel document that describes how to solve this problem.
>>>>>
>>>>>Given that solution, hyper-threading will work just fine since spinlocks
>>>>>won't confuse the processor...
>>>>>
>>>>>It won't be 2x faster, but it will certainly be faster if you can run a second
>>>>>thread while the first is blocked on a memory access...
>>>>
>>>>No it won't be 2 times faster. suppose you start crafty with 2 threads.
>>>
>>>I didn't say it would be _two_ times faster.
>>>
>>>I said it would be _faster_.
>>>
>>>And it will.
>>>
>>>
>>>
>>>>
>>>>thread A starts search and has 1.e4,e5
>>>>thread B starts and continues with 1.d4
>>>>
>>>>now when A is ready, B will still be busy with its own search space,
>>>>and delay thread A time and again.
>>>>
>>>>that'll slow down incredible.
>>>>
>>>
>>>
>>>Except that isn't how it works.  The threads co-execute in an intermingled
>>>way as one blocks for a memory read the other fills in the gap.  It is
>>>something like having 1.5 cpus...  and it does work.
>>>
>>>
>>>
>>>>You'll be a lot slower than searching with a single thread!
>>>>
>>>
>>>
>>>Not very likely...
>>>
>>>
>>>
>>>
>>>>Also note that there is just 8 KB data cache and just like
>>>>40 registers to rename variables. then another 12KB tracecache.
>>>>
>>>>*both* threads are eating from that 8 KB and 12KB tracecache,
>>>>that is an additional problem they 'overlook'.
>>>>
>>>
>>>
>>>That is a problem on an SMP machine.  But _both_ threads are executing
>>>the _same_ code anyway...  so that isn't a problem.  At least for me.
>>>
>>>For you it is different because you are not using "shared everything" in
>>>lightweight threads, so your results might be different.  But all my threads
>>>share the exact same executable instruction code...
>>>
>>>
>>>
>>>
>>>>As you can see from graphs. Usually SMT brings zero speedup.
>>>
>>>I have seen numbers around 1.3 up to 1.5...  which is not to be
>>>ignored.
>>>
>>>
>>>
>>>>
>>>>Try crafty on a 2.4Ghz single cpu P4 or P4-Xeon please (northwood) or
>>>>above. Not on a slower P4 or P4-Xeon. Of course we go for the latest
>>>>hardware...
>>>
>>>
>>>Why does it matter?  Hyper-Threading is Hyper-Threading, unless you are
>>>going to start that memory speed nonsense.  And, in fact, the faster the
>>>processor vs memory speed, the better hyperthreading should perform.  Just
>>>like the greater the difference in processor speed vs disk speed, the better
>>>normal operating systems do at running multiple processes.
>>>
>>>
>>>>
>>>>Just try it like i tried at Jan Louwman's 2.4Ghz P4s and 2.53Ghz P4s.
>>>
>>>That says it all.  "Like I tried it".  As if that is a comprehensive and
>>>exhaustive testing?
>>>
>>>>
>>>>I can't measure *any* speedup *anyhow*.
>>>>
>>>
>>>
>>>Why am I not surprised???
>>>
>>>
>>>
>>>>Also theoreticlaly i see major problems for the P4 chip even if you
>>>>have software which could theoretically profit.
>>>
>>>
>>>"theoretically".
>>>
>>>:)
>>>
>>>:)
>>>
>>>:)
>>>
>>>Theory from someone that doesn't know theory.
>>>
>>>:)
>>>
>>>:)



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.