Author: Vincent Diepeveen
Date: 14:07:10 11/01/02
Go up one level in this thread
On November 01, 2002 at 14:55:50, Eugene Nalimov wrote: So you have a P4 2.8 then to your avail. Can you post the results of that P4 2.8 single cpu for the next 4 results: first: P4 2.8 SMT in bios off and a) MT 1 b) MT 2 secondly: P4 2.8 SMT in bios on and a) MT 1 b) MT 2 Thanks in advance, Vincent >Once again: the system I run that test on is located in other building. I don't >want to bother the friend with rebooting/changing settings/etc. I run the test >on a 2.8Hz P4 with hyperthreading turned off, and got 50 seconds at 1,113knps. >50*(2.8/2.4) == 58, so 57 seconds looks about right. (I think it is slightly >slower than estimate because memory on 2.4GHz system is slower than on 2.8GHz >one). > >I run the same executable on AMD/2000. It tooks 56 seconds at 994knps to run the >test, so 57 seconds at 976knps again looks right. > >Thanks, >Eugene > >On November 01, 2002 at 14:35:21, Vincent Diepeveen wrote: > >>On November 01, 2002 at 13:58:06, Eugene Nalimov wrote: >> >>>I cannot produce the test you are demanding, as I don't have physical access to >>>the system on which I run the test, but here are my results. >>> >>>Dual P4/2.4GHz, hyperthreating turned on, Windows XP Professional. >>>Unmodified Crafty 19.0 (i.e. with "bad" spinlock loop). >>>"Bench" results (executable restarted after each test). >> >>also do the tests with SMT disabled in bios, >>it should produce the same results as in MT 1 and MT 2. >>If not then something different is wrong. In MT 4 it should >>produce something real bad there. >> >>Amazing that with 976 MT 1 you need only 57 seconds to finish the >>test. Single cpu AMD i need (but of course a bit older crafty version): >> >>White(1): hash 400MB >>hash table memory = 384M bytes. >>White(1): hashp 16MB >>pawn hash table memory = 10M bytes. >>White(1): bench >>Running benchmark. . . >>...... >>Total nodes: 92683962 >>Raw nodes per second: 827535 >>Total elapsed time: 112 >>SMP time-to-ply measurement: 5.714286 >>White(1): quit >>execution complete. >> >>Or in short 112 seconds (visual c++ 6.0 sp4 proc pack default compile) >>and 827 K nps. >> >>You need millions of nodes less? >> >>>mt=1: 976knps, 57 seconds >>>mt=2: 1,705knps, 38 seconds >>>mt=4: 2,006knps, 35 seconds >>> >>>I.e. there is not only ~17% raw nps speedup, but *absolute time* is also ~8% >>>smaller. >>> >>>And that is for the executable that is non-hyperthread aware, i.e. contains bad >>>spinlock loop. >>> >>>I tested exactly the executable that is on Bob's FTP site. You can download it >>>yourself. >>> >>>Thanks, >>>Eugene >>> >>>On November 01, 2002 at 13:06:53, Vincent Diepeveen wrote: >>> >>>>On November 01, 2002 at 12:20:14, Robert Hyatt wrote: >>>> >>>>Feel free to ship a version of crafty that doesn't do spinlock >>>>or whatever you want to modify. I'll extensively test it for you >>>>at all P4s i can get my hands on... >>>> >>>>I would be really amazed if you get even 0.1% faster in nodes a >>>>second... >>>> >>>>...of course it must be a fair compare in contradiction to what >>>>intel shows. They do next comparision >>>> >>>> a) some feature called 'SMT' in the bios turned on >>>> - just running 2 threads then >>>> b) turning it off >>>> - also running 2 threads at it >>>> >>>>Like everyone who is not so naive we know that you also need >>>>to do next test: >>>> >>>> a) some feature called 'SMT' in the bios turned on >>>> - just running 1 thread eating all system time >>>> b) turning it off >>>> - also running 1 thread eating all system time >>>> >>>>There shouldn't be a speed difference between a and b of course. >>>> >>>>That verification step is missing. >>>> >>>> >>>> >>>>>On November 01, 2002 at 11:56:56, Vincent Diepeveen wrote: >>>>> >>>>>>On November 01, 2002 at 10:41:25, Robert Hyatt wrote: >>>>>> >>>>>>>On October 31, 2002 at 10:53:07, Vincent Diepeveen wrote: >>>>>>> >>>>>>>>On October 30, 2002 at 06:59:21, Terje Vagle wrote: >>>>>>>> >>>>>>>>>Hi all, >>>>>>>>> >>>>>>>>>The new cpu from intel will have a new function called >>>>>>>>>hyper-threading. >>>>>>>>> >>>>>>>>>This will make the operating system able to recognize the cpu as if it was >>>>>>>>>2 cpu's. >>>>>>>>> >>>>>>>>>Could the programs with smp-support make use of this? >>>>>>>>> >>>>>>>>>Regards, >>>>>>>>> >>>>>>>>>Terje Vagle >>>>>>>> >>>>>>>>No chessprograms cannot make use of that feature at all. It is sad but >>>>>>>>the truth. Hyperthreading is a cool thing for the future but the P4 >>>>>>>>processor is a too small processor to allow hyperthreading from getting >>>>>>>>to work. >>>>>>>> >>>>>>>>Apart from that a major problem is that even if we have a great processor >>>>>>>>which really allows hyperthreading to be effective, that the threads >>>>>>>>run at unequal speeds. >>>>>>>> >>>>>>>>Hyper threading is supposed to work for 2 threads where 1 is a fast >>>>>>>>thread and the other is some kind of background thread eating little cpu >>>>>>>>time. >>>>>>>> >>>>>>>>In chessprograms having a second search thread which just runs now and >>>>>>>>then in the background is simply impossible to use. >>>>>>> >>>>>>> >>>>>>>It is not impossible at all. The only problem was spinlocks and Eugene >>>>>>>posted a link to an Intel document that describes how to solve this problem. >>>>>>> >>>>>>>Given that solution, hyper-threading will work just fine since spinlocks >>>>>>>won't confuse the processor... >>>>>>> >>>>>>>It won't be 2x faster, but it will certainly be faster if you can run a second >>>>>>>thread while the first is blocked on a memory access... >>>>>> >>>>>>No it won't be 2 times faster. suppose you start crafty with 2 threads. >>>>> >>>>>I didn't say it would be _two_ times faster. >>>>> >>>>>I said it would be _faster_. >>>>> >>>>>And it will. >>>>> >>>>> >>>>> >>>>>> >>>>>>thread A starts search and has 1.e4,e5 >>>>>>thread B starts and continues with 1.d4 >>>>>> >>>>>>now when A is ready, B will still be busy with its own search space, >>>>>>and delay thread A time and again. >>>>>> >>>>>>that'll slow down incredible. >>>>>> >>>>> >>>>> >>>>>Except that isn't how it works. The threads co-execute in an intermingled >>>>>way as one blocks for a memory read the other fills in the gap. It is >>>>>something like having 1.5 cpus... and it does work. >>>>> >>>>> >>>>> >>>>>>You'll be a lot slower than searching with a single thread! >>>>>> >>>>> >>>>> >>>>>Not very likely... >>>>> >>>>> >>>>> >>>>> >>>>>>Also note that there is just 8 KB data cache and just like >>>>>>40 registers to rename variables. then another 12KB tracecache. >>>>>> >>>>>>*both* threads are eating from that 8 KB and 12KB tracecache, >>>>>>that is an additional problem they 'overlook'. >>>>>> >>>>> >>>>> >>>>>That is a problem on an SMP machine. But _both_ threads are executing >>>>>the _same_ code anyway... so that isn't a problem. At least for me. >>>>> >>>>>For you it is different because you are not using "shared everything" in >>>>>lightweight threads, so your results might be different. But all my threads >>>>>share the exact same executable instruction code... >>>>> >>>>> >>>>> >>>>> >>>>>>As you can see from graphs. Usually SMT brings zero speedup. >>>>> >>>>>I have seen numbers around 1.3 up to 1.5... which is not to be >>>>>ignored. >>>>> >>>>> >>>>> >>>>>> >>>>>>Try crafty on a 2.4Ghz single cpu P4 or P4-Xeon please (northwood) or >>>>>>above. Not on a slower P4 or P4-Xeon. Of course we go for the latest >>>>>>hardware... >>>>> >>>>> >>>>>Why does it matter? Hyper-Threading is Hyper-Threading, unless you are >>>>>going to start that memory speed nonsense. And, in fact, the faster the >>>>>processor vs memory speed, the better hyperthreading should perform. Just >>>>>like the greater the difference in processor speed vs disk speed, the better >>>>>normal operating systems do at running multiple processes. >>>>> >>>>> >>>>>> >>>>>>Just try it like i tried at Jan Louwman's 2.4Ghz P4s and 2.53Ghz P4s. >>>>> >>>>>That says it all. "Like I tried it". As if that is a comprehensive and >>>>>exhaustive testing? >>>>> >>>>>> >>>>>>I can't measure *any* speedup *anyhow*. >>>>>> >>>>> >>>>> >>>>>Why am I not surprised??? >>>>> >>>>> >>>>> >>>>>>Also theoreticlaly i see major problems for the P4 chip even if you >>>>>>have software which could theoretically profit. >>>>> >>>>> >>>>>"theoretically". >>>>> >>>>>:) >>>>> >>>>>:) >>>>> >>>>>:) >>>>> >>>>>Theory from someone that doesn't know theory. >>>>> >>>>>:) >>>>> >>>>>:)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.