Author: Vincent Diepeveen
Date: 07:07:07 11/03/02
Go up one level in this thread
On November 02, 2002 at 00:10:17, Robert Hyatt wrote: At the P4 with 1 decoder, 12K i cache and just 8KB data cache i could measure no speedup. Only slow downs if i tried to run too many threads. Your claims with crafty proofs it fits within the trace cache somehow. Also i tested at SINGLE CPU P4s and there i could measure no speedup at all. Only disasters. I will test crafty on the things too. >On November 01, 2002 at 17:26:39, Eugene Nalimov wrote: > >>Vincent, >> >>I am explaining it to you the 3rd time: I can run tests on those systems, but I >>have no physical access to them, so I cannot turn something in BIOS on or off. >>That's why I compared results of 2.4GHz system with hyperthreading on and 2.8Hz >>one with hyperthreading off -- to show that results are the same if you'll take >>into account the speed difference. >> >>Net result: you can look at the numbers I posted, and you will definitely see >>that hyperthreading gives current Crafty, without any hyperthread-related >>modifications, double-digit improvement. >> >>Thanks, >>Eugene > > >You are wasting your time. He has made up his mind, declared hyper-threading >worthless, and that is that. > > >> >>On November 01, 2002 at 17:07:10, Vincent Diepeveen wrote: >> >>>On November 01, 2002 at 14:55:50, Eugene Nalimov wrote: >>> >>>So you have a P4 2.8 then to your avail. >>> >>>Can you post the results of that P4 2.8 single cpu for the next 4 >>>results: >>> >>>first: >>> P4 2.8 SMT in bios off and >>> a) MT 1 >>> b) MT 2 >>> >>>secondly: >>> P4 2.8 SMT in bios on and >>> a) MT 1 >>> b) MT 2 >>> >>>Thanks in advance, >>>Vincent >>> >>>>Once again: the system I run that test on is located in other building. I don't >>>>want to bother the friend with rebooting/changing settings/etc. I run the test >>>>on a 2.8Hz P4 with hyperthreading turned off, and got 50 seconds at 1,113knps. >>>>50*(2.8/2.4) == 58, so 57 seconds looks about right. (I think it is slightly >>>>slower than estimate because memory on 2.4GHz system is slower than on 2.8GHz >>>>one). >>>> >>>>I run the same executable on AMD/2000. It tooks 56 seconds at 994knps to run the >>>>test, so 57 seconds at 976knps again looks right. >>>> >>>>Thanks, >>>>Eugene >>>> >>>>On November 01, 2002 at 14:35:21, Vincent Diepeveen wrote: >>>> >>>>>On November 01, 2002 at 13:58:06, Eugene Nalimov wrote: >>>>> >>>>>>I cannot produce the test you are demanding, as I don't have physical access to >>>>>>the system on which I run the test, but here are my results. >>>>>> >>>>>>Dual P4/2.4GHz, hyperthreating turned on, Windows XP Professional. >>>>>>Unmodified Crafty 19.0 (i.e. with "bad" spinlock loop). >>>>>>"Bench" results (executable restarted after each test). >>>>> >>>>>also do the tests with SMT disabled in bios, >>>>>it should produce the same results as in MT 1 and MT 2. >>>>>If not then something different is wrong. In MT 4 it should >>>>>produce something real bad there. >>>>> >>>>>Amazing that with 976 MT 1 you need only 57 seconds to finish the >>>>>test. Single cpu AMD i need (but of course a bit older crafty version): >>>>> >>>>>White(1): hash 400MB >>>>>hash table memory = 384M bytes. >>>>>White(1): hashp 16MB >>>>>pawn hash table memory = 10M bytes. >>>>>White(1): bench >>>>>Running benchmark. . . >>>>>...... >>>>>Total nodes: 92683962 >>>>>Raw nodes per second: 827535 >>>>>Total elapsed time: 112 >>>>>SMP time-to-ply measurement: 5.714286 >>>>>White(1): quit >>>>>execution complete. >>>>> >>>>>Or in short 112 seconds (visual c++ 6.0 sp4 proc pack default compile) >>>>>and 827 K nps. >>>>> >>>>>You need millions of nodes less? >>>>> >>>>>>mt=1: 976knps, 57 seconds >>>>>>mt=2: 1,705knps, 38 seconds >>>>>>mt=4: 2,006knps, 35 seconds >>>>>> >>>>>>I.e. there is not only ~17% raw nps speedup, but *absolute time* is also ~8% >>>>>>smaller. >>>>>> >>>>>>And that is for the executable that is non-hyperthread aware, i.e. contains bad >>>>>>spinlock loop. >>>>>> >>>>>>I tested exactly the executable that is on Bob's FTP site. You can download it >>>>>>yourself. >>>>>> >>>>>>Thanks, >>>>>>Eugene >>>>>> >>>>>>On November 01, 2002 at 13:06:53, Vincent Diepeveen wrote: >>>>>> >>>>>>>On November 01, 2002 at 12:20:14, Robert Hyatt wrote: >>>>>>> >>>>>>>Feel free to ship a version of crafty that doesn't do spinlock >>>>>>>or whatever you want to modify. I'll extensively test it for you >>>>>>>at all P4s i can get my hands on... >>>>>>> >>>>>>>I would be really amazed if you get even 0.1% faster in nodes a >>>>>>>second... >>>>>>> >>>>>>>...of course it must be a fair compare in contradiction to what >>>>>>>intel shows. They do next comparision >>>>>>> >>>>>>> a) some feature called 'SMT' in the bios turned on >>>>>>> - just running 2 threads then >>>>>>> b) turning it off >>>>>>> - also running 2 threads at it >>>>>>> >>>>>>>Like everyone who is not so naive we know that you also need >>>>>>>to do next test: >>>>>>> >>>>>>> a) some feature called 'SMT' in the bios turned on >>>>>>> - just running 1 thread eating all system time >>>>>>> b) turning it off >>>>>>> - also running 1 thread eating all system time >>>>>>> >>>>>>>There shouldn't be a speed difference between a and b of course. >>>>>>>>>>>>>>That verification step is missing. >>>>>>> >>>>>>> >>>>>>> >>>>>>>>On November 01, 2002 at 11:56:56, Vincent Diepeveen wrote: >>>>>>>> >>>>>>>>>On November 01, 2002 at 10:41:25, Robert Hyatt wrote: >>>>>>>>> >>>>>>>>>>On October 31, 2002 at 10:53:07, Vincent Diepeveen wrote: >>>>>>>>>> >>>>>>>>>>>On October 30, 2002 at 06:59:21, Terje Vagle wrote: >>>>>>>>>>> >>>>>>>>>>>>Hi all, >>>>>>>>>>>> >>>>>>>>>>>>The new cpu from intel will have a new function called >>>>>>>>>>>>hyper-threading. >>>>>>>>>>>> >>>>>>>>>>>>This will make the operating system able to recognize the cpu as if it was >>>>>>>>>>>>2 cpu's. >>>>>>>>>>>> >>>>>>>>>>>>Could the programs with smp-support make use of this? >>>>>>>>>>>> >>>>>>>>>>>>Regards, >>>>>>>>>>>> >>>>>>>>>>>>Terje Vagle >>>>>>>>>>> >>>>>>>>>>>No chessprograms cannot make use of that feature at all. It is sad but >>>>>>>>>>>the truth. Hyperthreading is a cool thing for the future but the P4 >>>>>>>>>>>processor is a too small processor to allow hyperthreading from getting >>>>>>>>>>>to work. >>>>>>>>>>> >>>>>>>>>>>Apart from that a major problem is that even if we have a great processor >>>>>>>>>>>which really allows hyperthreading to be effective, that the threads >>>>>>>>>>>run at unequal speeds. >>>>>>>>>>> >>>>>>>>>>>Hyper threading is supposed to work for 2 threads where 1 is a fast >>>>>>>>>>>thread and the other is some kind of background thread eating little cpu >>>>>>>>>>>time. >>>>>>>>>>> >>>>>>>>>>>In chessprograms having a second search thread which just runs now and >>>>>>>>>>>then in the background is simply impossible to use. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>It is not impossible at all. The only problem was spinlocks and Eugene >>>>>>>>>>posted a link to an Intel document that describes how to solve this problem. >>>>>>>>>> >>>>>>>>>>Given that solution, hyper-threading will work just fine since spinlocks >>>>>>>>>>won't confuse the processor... >>>>>>>>>> >>>>>>>>>>It won't be 2x faster, but it will certainly be faster if you can run a second >>>>>>>>>>thread while the first is blocked on a memory access... >>>>>>>>> >>>>>>>>>No it won't be 2 times faster. suppose you start crafty with 2 threads. >>>>>>>> >>>>>>>>I didn't say it would be _two_ times faster. >>>>>>>> >>>>>>>>I said it would be _faster_. >>>>>>>> >>>>>>>>And it will. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>thread A starts search and has 1.e4,e5 >>>>>>>>>thread B starts and continues with 1.d4 >>>>>>>>> >>>>>>>>>now when A is ready, B will still be busy with its own search space, >>>>>>>>>and delay thread A time and again. >>>>>>>>> >>>>>>>>>that'll slow down incredible. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Except that isn't how it works. The threads co-execute in an intermingled >>>>>>>>way as one blocks for a memory read the other fills in the gap. It is >>>>>>>>something like having 1.5 cpus... and it does work. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>You'll be a lot slower than searching with a single thread! >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Not very likely... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Also note that there is just 8 KB data cache and just like >>>>>>>>>40 registers to rename variables. then another 12KB tracecache. >>>>>>>>> >>>>>>>>>*both* threads are eating from that 8 KB and 12KB tracecache, >>>>>>>>>that is an additional problem they 'overlook'. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>That is a problem on an SMP machine. But _both_ threads are executing >>>>>>>>the _same_ code anyway... so that isn't a problem. At least for me. >>>>>>>> >>>>>>>>For you it is different because you are not using "shared everything" in >>>>>>>>lightweight threads, so your results might be different. But all my threads >>>>>>>>share the exact same executable instruction code... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>As you can see from graphs. Usually SMT brings zero speedup. >>>>>>>> >>>>>>>>I have seen numbers around 1.3 up to 1.5... which is not to be >>>>>>>>ignored. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>Try crafty on a 2.4Ghz single cpu P4 or P4-Xeon please (northwood) or >>>>>>>>>above. Not on a slower P4 or P4-Xeon. Of course we go for the latest >>>>>>>>>hardware... >>>>>>>> >>>>>>>> >>>>>>>>Why does it matter? Hyper-Threading is Hyper-Threading, unless you are >>>>>>>>going to start that memory speed nonsense. And, in fact, the faster the >>>>>>>>processor vs memory speed, the better hyperthreading should perform. Just >>>>>>>>like the greater the difference in processor speed vs disk speed, the better >>>>>>>>normal operating systems do at running multiple processes. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>Just try it like i tried at Jan Louwman's 2.4Ghz P4s and 2.53Ghz P4s. >>>>>>>> >>>>>>>>That says it all. "Like I tried it". As if that is a comprehensive and >>>>>>>>exhaustive testing? >>>>>>>> >>>>>>>>> >>>>>>>>>I can't measure *any* speedup *anyhow*. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Why am I not surprised??? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Also theoreticlaly i see major problems for the P4 chip even if you >>>>>>>>>have software which could theoretically profit. >>>>>>>> >>>>>>>> >>>>>>>>"theoretically". >>>>>>>> >>>>>>>>:) >>>>>>>> >>>>>>>>:) >>>>>>>> >>>>>>>>:) >>>>>>>> >>>>>>>>Theory from someone that doesn't know theory. >>>>>>>> >>>>>>>>:) >>>>>>>> >>>>>>>>:)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.