Author: Eugene Nalimov
Date: 11:22:10 11/01/02
Go up one level in this thread
Sorry, after I wrote previous message, I remembered about parallel non-determinism, so I re-run the tests 5 times on the same system and calculated average. mt=2: times are 42, 43, 39, 40, 38, average is 40. mt=2: times are 36, 31, 37, 29, 38, average is 34. So average speedup is even greater than I estimated earlier, something like ~15%. Thanks, Eugene On November 01, 2002 at 13:58:06, Eugene Nalimov wrote: >I cannot produce the test you are demanding, as I don't have physical access to >the system on which I run the test, but here are my results. > >Dual P4/2.4GHz, hyperthreating turned on, Windows XP Professional. >Unmodified Crafty 19.0 (i.e. with "bad" spinlock loop). >"Bench" results (executable restarted after each test). > >mt=1: 976knps, 57 seconds >mt=2: 1,705knps, 38 seconds >mt=4: 2,006knps, 35 seconds > >I.e. there is not only ~17% raw nps speedup, but *absolute time* is also ~8% >smaller. > >And that is for the executable that is non-hyperthread aware, i.e. contains bad >spinlock loop. > >I tested exactly the executable that is on Bob's FTP site. You can download it >yourself. > >Thanks, >Eugene > >On November 01, 2002 at 13:06:53, Vincent Diepeveen wrote: > >>On November 01, 2002 at 12:20:14, Robert Hyatt wrote: >> >>Feel free to ship a version of crafty that doesn't do spinlock >>or whatever you want to modify. I'll extensively test it for you >>at all P4s i can get my hands on... >> >>I would be really amazed if you get even 0.1% faster in nodes a >>second... >> >>...of course it must be a fair compare in contradiction to what >>intel shows. They do next comparision >> >> a) some feature called 'SMT' in the bios turned on >> - just running 2 threads then >> b) turning it off >> - also running 2 threads at it >> >>Like everyone who is not so naive we know that you also need >>to do next test: >> >> a) some feature called 'SMT' in the bios turned on >> - just running 1 thread eating all system time >> b) turning it off >> - also running 1 thread eating all system time >> >>There shouldn't be a speed difference between a and b of course. >> >>That verification step is missing. >> >> >> >>>On November 01, 2002 at 11:56:56, Vincent Diepeveen wrote: >>> >>>>On November 01, 2002 at 10:41:25, Robert Hyatt wrote: >>>> >>>>>On October 31, 2002 at 10:53:07, Vincent Diepeveen wrote: >>>>> >>>>>>On October 30, 2002 at 06:59:21, Terje Vagle wrote: >>>>>> >>>>>>>Hi all, >>>>>>> >>>>>>>The new cpu from intel will have a new function called >>>>>>>hyper-threading. >>>>>>> >>>>>>>This will make the operating system able to recognize the cpu as if it was >>>>>>>2 cpu's. >>>>>>> >>>>>>>Could the programs with smp-support make use of this? >>>>>>> >>>>>>>Regards, >>>>>>> >>>>>>>Terje Vagle >>>>>> >>>>>>No chessprograms cannot make use of that feature at all. It is sad but >>>>>>the truth. Hyperthreading is a cool thing for the future but the P4 >>>>>>processor is a too small processor to allow hyperthreading from getting >>>>>>to work. >>>>>> >>>>>>Apart from that a major problem is that even if we have a great processor >>>>>>which really allows hyperthreading to be effective, that the threads >>>>>>run at unequal speeds. >>>>>> >>>>>>Hyper threading is supposed to work for 2 threads where 1 is a fast >>>>>>thread and the other is some kind of background thread eating little cpu >>>>>>time. >>>>>> >>>>>>In chessprograms having a second search thread which just runs now and >>>>>>then in the background is simply impossible to use. >>>>> >>>>> >>>>>It is not impossible at all. The only problem was spinlocks and Eugene >>>>>posted a link to an Intel document that describes how to solve this problem. >>>>> >>>>>Given that solution, hyper-threading will work just fine since spinlocks >>>>>won't confuse the processor... >>>>> >>>>>It won't be 2x faster, but it will certainly be faster if you can run a second >>>>>thread while the first is blocked on a memory access... >>>> >>>>No it won't be 2 times faster. suppose you start crafty with 2 threads. >>> >>>I didn't say it would be _two_ times faster. >>> >>>I said it would be _faster_. >>> >>>And it will. >>> >>> >>> >>>> >>>>thread A starts search and has 1.e4,e5 >>>>thread B starts and continues with 1.d4 >>>> >>>>now when A is ready, B will still be busy with its own search space, >>>>and delay thread A time and again. >>>> >>>>that'll slow down incredible. >>>> >>> >>> >>>Except that isn't how it works. The threads co-execute in an intermingled >>>way as one blocks for a memory read the other fills in the gap. It is >>>something like having 1.5 cpus... and it does work. >>> >>> >>> >>>>You'll be a lot slower than searching with a single thread! >>>> >>> >>> >>>Not very likely... >>> >>> >>> >>> >>>>Also note that there is just 8 KB data cache and just like >>>>40 registers to rename variables. then another 12KB tracecache. >>>> >>>>*both* threads are eating from that 8 KB and 12KB tracecache, >>>>that is an additional problem they 'overlook'. >>>> >>> >>> >>>That is a problem on an SMP machine. But _both_ threads are executing >>>the _same_ code anyway... so that isn't a problem. At least for me. >>> >>>For you it is different because you are not using "shared everything" in >>>lightweight threads, so your results might be different. But all my threads >>>share the exact same executable instruction code... >>> >>> >>> >>> >>>>As you can see from graphs. Usually SMT brings zero speedup. >>> >>>I have seen numbers around 1.3 up to 1.5... which is not to be >>>ignored. >>> >>> >>> >>>> >>>>Try crafty on a 2.4Ghz single cpu P4 or P4-Xeon please (northwood) or >>>>above. Not on a slower P4 or P4-Xeon. Of course we go for the latest >>>>hardware... >>> >>> >>>Why does it matter? Hyper-Threading is Hyper-Threading, unless you are >>>going to start that memory speed nonsense. And, in fact, the faster the >>>processor vs memory speed, the better hyperthreading should perform. Just >>>like the greater the difference in processor speed vs disk speed, the better >>>normal operating systems do at running multiple processes. >>> >>> >>>> >>>>Just try it like i tried at Jan Louwman's 2.4Ghz P4s and 2.53Ghz P4s. >>> >>>That says it all. "Like I tried it". As if that is a comprehensive and >>>exhaustive testing? >>> >>>> >>>>I can't measure *any* speedup *anyhow*. >>>> >>> >>> >>>Why am I not surprised??? >>> >>> >>> >>>>Also theoreticlaly i see major problems for the P4 chip even if you >>>>have software which could theoretically profit. >>> >>> >>>"theoretically". >>> >>>:) >>> >>>:) >>> >>>:) >>> >>>Theory from someone that doesn't know theory. >>> >>>:) >>> >>>:)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.