Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: OT: P4- 3 GHz with hyper-threading

Author: Vincent Diepeveen

Date: 07:07:07 11/03/02

Go up one level in this thread


On November 02, 2002 at 00:10:17, Robert Hyatt wrote:

At the P4 with 1 decoder, 12K i cache and just 8KB data cache
i could measure no speedup. Only slow downs if i tried to run
too many threads.

Your claims with crafty proofs it fits within the trace cache somehow.

Also i tested at SINGLE CPU P4s and there i could measure no speedup
at all. Only disasters. I will test crafty on the things too.

>On November 01, 2002 at 17:26:39, Eugene Nalimov wrote:
>
>>Vincent,
>>
>>I am explaining it to you the 3rd time: I can run tests on those systems, but I
>>have no physical access to them, so I cannot turn something in BIOS on or off.
>>That's why I compared results of 2.4GHz system with hyperthreading on and 2.8Hz
>>one with hyperthreading off -- to show that results are the same if you'll take
>>into account the speed difference.
>>
>>Net result: you can look at the numbers I posted, and you will definitely see
>>that hyperthreading gives current Crafty, without any hyperthread-related
>>modifications, double-digit improvement.
>>
>>Thanks,
>>Eugene
>
>
>You are wasting your time.  He has made up his mind, declared hyper-threading
>worthless, and that is that.
>
>
>>
>>On November 01, 2002 at 17:07:10, Vincent Diepeveen wrote:
>>
>>>On November 01, 2002 at 14:55:50, Eugene Nalimov wrote:
>>>
>>>So you have a P4 2.8 then to your avail.
>>>
>>>Can you post the results of that P4 2.8 single cpu for the next 4
>>>results:
>>>
>>>first:
>>> P4 2.8 SMT in bios off and
>>>   a) MT 1
>>>   b) MT 2
>>>
>>>secondly:
>>> P4 2.8 SMT in bios on and
>>>   a) MT 1
>>>   b) MT 2
>>>
>>>Thanks in advance,
>>>Vincent
>>>
>>>>Once again: the system I run that test on is located in other building. I don't
>>>>want to bother the friend with rebooting/changing settings/etc. I run the test
>>>>on a 2.8Hz P4 with hyperthreading turned off, and got 50 seconds at 1,113knps.
>>>>50*(2.8/2.4) == 58, so 57 seconds looks about right. (I think it is slightly
>>>>slower than estimate because memory on 2.4GHz system is slower than on 2.8GHz
>>>>one).
>>>>
>>>>I run the same executable on AMD/2000. It tooks 56 seconds at 994knps to run the
>>>>test, so 57 seconds at 976knps again looks right.
>>>>
>>>>Thanks,
>>>>Eugene
>>>>
>>>>On November 01, 2002 at 14:35:21, Vincent Diepeveen wrote:
>>>>
>>>>>On November 01, 2002 at 13:58:06, Eugene Nalimov wrote:
>>>>>
>>>>>>I cannot produce the test you are demanding, as I don't have physical access to
>>>>>>the system on which I run the test, but here are my results.
>>>>>>
>>>>>>Dual P4/2.4GHz, hyperthreating turned on, Windows XP Professional.
>>>>>>Unmodified Crafty 19.0 (i.e. with "bad" spinlock loop).
>>>>>>"Bench" results (executable restarted after each test).
>>>>>
>>>>>also do the tests with SMT disabled in bios,
>>>>>it should produce the same results as in MT 1 and MT 2.
>>>>>If not then something different is wrong. In MT 4 it should
>>>>>produce something real bad there.
>>>>>
>>>>>Amazing that with 976 MT 1 you need only 57 seconds to finish the
>>>>>test. Single cpu AMD i need (but of course a bit older crafty version):
>>>>>
>>>>>White(1): hash 400MB
>>>>>hash table memory = 384M bytes.
>>>>>White(1): hashp 16MB
>>>>>pawn hash table memory = 10M bytes.
>>>>>White(1): bench
>>>>>Running benchmark. . .
>>>>>......
>>>>>Total nodes: 92683962
>>>>>Raw nodes per second: 827535
>>>>>Total elapsed time: 112
>>>>>SMP time-to-ply measurement: 5.714286
>>>>>White(1): quit
>>>>>execution complete.
>>>>>
>>>>>Or in short 112 seconds (visual c++ 6.0 sp4 proc pack default compile)
>>>>>and 827 K nps.
>>>>>
>>>>>You need millions of nodes less?
>>>>>
>>>>>>mt=1:   976knps, 57 seconds
>>>>>>mt=2: 1,705knps, 38 seconds
>>>>>>mt=4: 2,006knps, 35 seconds
>>>>>>
>>>>>>I.e. there is not only ~17% raw nps speedup, but *absolute time* is also ~8%
>>>>>>smaller.
>>>>>>
>>>>>>And that is for the executable that is non-hyperthread aware, i.e. contains bad
>>>>>>spinlock loop.
>>>>>>
>>>>>>I tested exactly the executable that is on Bob's FTP site. You can download it
>>>>>>yourself.
>>>>>>
>>>>>>Thanks,
>>>>>>Eugene
>>>>>>
>>>>>>On November 01, 2002 at 13:06:53, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On November 01, 2002 at 12:20:14, Robert Hyatt wrote:
>>>>>>>
>>>>>>>Feel free to ship a version of crafty that doesn't do spinlock
>>>>>>>or whatever you want to modify. I'll extensively test it for you
>>>>>>>at all P4s i can get my hands on...
>>>>>>>
>>>>>>>I would be really amazed if you get even 0.1% faster in nodes a
>>>>>>>second...
>>>>>>>
>>>>>>>...of course it must be a fair compare in contradiction to what
>>>>>>>intel shows. They do next comparision
>>>>>>>
>>>>>>>  a) some feature called 'SMT' in the bios turned on
>>>>>>>     - just running 2 threads then
>>>>>>>  b) turning it off
>>>>>>>     - also running 2 threads at it
>>>>>>>
>>>>>>>Like everyone who is not so naive we know that you also need
>>>>>>>to do next test:
>>>>>>>
>>>>>>>  a) some feature called 'SMT' in the bios turned on
>>>>>>>     - just running 1 thread eating all system time
>>>>>>>  b) turning it off
>>>>>>>     - also running 1 thread eating all system time
>>>>>>>
>>>>>>>There shouldn't be a speed difference between a and b of course.
>>>>>>>>>>>>>>That verification step is missing.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>On November 01, 2002 at 11:56:56, Vincent Diepeveen wrote:
>>>>>>>>
>>>>>>>>>On November 01, 2002 at 10:41:25, Robert Hyatt wrote:
>>>>>>>>>
>>>>>>>>>>On October 31, 2002 at 10:53:07, Vincent Diepeveen wrote:
>>>>>>>>>>
>>>>>>>>>>>On October 30, 2002 at 06:59:21, Terje Vagle wrote:
>>>>>>>>>>>
>>>>>>>>>>>>Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>>The new cpu from intel will have a new function called
>>>>>>>>>>>>hyper-threading.
>>>>>>>>>>>>
>>>>>>>>>>>>This will make the operating system able to recognize the cpu as if it was
>>>>>>>>>>>>2 cpu's.
>>>>>>>>>>>>
>>>>>>>>>>>>Could the programs with smp-support make use of this?
>>>>>>>>>>>>
>>>>>>>>>>>>Regards,
>>>>>>>>>>>>
>>>>>>>>>>>>Terje Vagle
>>>>>>>>>>>
>>>>>>>>>>>No chessprograms cannot make use of that feature at all. It is sad but
>>>>>>>>>>>the truth. Hyperthreading is a cool thing for the future but the P4
>>>>>>>>>>>processor is a too small processor to allow hyperthreading from getting
>>>>>>>>>>>to work.
>>>>>>>>>>>
>>>>>>>>>>>Apart from that a major problem is that even if we have a great processor
>>>>>>>>>>>which really allows hyperthreading to be effective, that the threads
>>>>>>>>>>>run at unequal speeds.
>>>>>>>>>>>
>>>>>>>>>>>Hyper threading is supposed to work for 2 threads where 1 is a fast
>>>>>>>>>>>thread and the other is some kind of background thread eating little cpu
>>>>>>>>>>>time.
>>>>>>>>>>>
>>>>>>>>>>>In chessprograms having a second search thread which just runs now and
>>>>>>>>>>>then in the background is simply impossible to use.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>It is not impossible at all.  The only problem was spinlocks and Eugene
>>>>>>>>>>posted a link to an Intel document that describes how to solve this problem.
>>>>>>>>>>
>>>>>>>>>>Given that solution, hyper-threading will work just fine since spinlocks
>>>>>>>>>>won't confuse the processor...
>>>>>>>>>>
>>>>>>>>>>It won't be 2x faster, but it will certainly be faster if you can run a second
>>>>>>>>>>thread while the first is blocked on a memory access...
>>>>>>>>>
>>>>>>>>>No it won't be 2 times faster. suppose you start crafty with 2 threads.
>>>>>>>>
>>>>>>>>I didn't say it would be _two_ times faster.
>>>>>>>>
>>>>>>>>I said it would be _faster_.
>>>>>>>>
>>>>>>>>And it will.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>thread A starts search and has 1.e4,e5
>>>>>>>>>thread B starts and continues with 1.d4
>>>>>>>>>
>>>>>>>>>now when A is ready, B will still be busy with its own search space,
>>>>>>>>>and delay thread A time and again.
>>>>>>>>>
>>>>>>>>>that'll slow down incredible.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>Except that isn't how it works.  The threads co-execute in an intermingled
>>>>>>>>way as one blocks for a memory read the other fills in the gap.  It is
>>>>>>>>something like having 1.5 cpus...  and it does work.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>You'll be a lot slower than searching with a single thread!
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>Not very likely...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Also note that there is just 8 KB data cache and just like
>>>>>>>>>40 registers to rename variables. then another 12KB tracecache.
>>>>>>>>>
>>>>>>>>>*both* threads are eating from that 8 KB and 12KB tracecache,
>>>>>>>>>that is an additional problem they 'overlook'.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>That is a problem on an SMP machine.  But _both_ threads are executing
>>>>>>>>the _same_ code anyway...  so that isn't a problem.  At least for me.
>>>>>>>>
>>>>>>>>For you it is different because you are not using "shared everything" in
>>>>>>>>lightweight threads, so your results might be different.  But all my threads
>>>>>>>>share the exact same executable instruction code...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>As you can see from graphs. Usually SMT brings zero speedup.
>>>>>>>>
>>>>>>>>I have seen numbers around 1.3 up to 1.5...  which is not to be
>>>>>>>>ignored.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>Try crafty on a 2.4Ghz single cpu P4 or P4-Xeon please (northwood) or
>>>>>>>>>above. Not on a slower P4 or P4-Xeon. Of course we go for the latest
>>>>>>>>>hardware...
>>>>>>>>
>>>>>>>>
>>>>>>>>Why does it matter?  Hyper-Threading is Hyper-Threading, unless you are
>>>>>>>>going to start that memory speed nonsense.  And, in fact, the faster the
>>>>>>>>processor vs memory speed, the better hyperthreading should perform.  Just
>>>>>>>>like the greater the difference in processor speed vs disk speed, the better
>>>>>>>>normal operating systems do at running multiple processes.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>Just try it like i tried at Jan Louwman's 2.4Ghz P4s and 2.53Ghz P4s.
>>>>>>>>
>>>>>>>>That says it all.  "Like I tried it".  As if that is a comprehensive and
>>>>>>>>exhaustive testing?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>I can't measure *any* speedup *anyhow*.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>Why am I not surprised???
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Also theoreticlaly i see major problems for the P4 chip even if you
>>>>>>>>>have software which could theoretically profit.
>>>>>>>>
>>>>>>>>
>>>>>>>>"theoretically".
>>>>>>>>
>>>>>>>>:)
>>>>>>>>
>>>>>>>>:)
>>>>>>>>
>>>>>>>>:)
>>>>>>>>
>>>>>>>>Theory from someone that doesn't know theory.
>>>>>>>>
>>>>>>>>:)
>>>>>>>>
>>>>>>>>:)



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.