Author: Robert Hyatt
Date: 17:00:11 12/10/02
Go up one level in this thread
On December 10, 2002 at 16:43:29, Matt Taylor wrote: >On December 10, 2002 at 16:35:11, Robert Hyatt wrote: > >>On December 10, 2002 at 14:31:51, Matt Taylor wrote: >> >>>On December 10, 2002 at 13:18:45, Robert Hyatt wrote: >>> >>>>On December 10, 2002 at 12:31:46, Matt Taylor wrote: >>>> >>>>>On December 10, 2002 at 12:21:33, Robert Hyatt wrote: >>>>> >>>>>>On December 10, 2002 at 11:34:45, Jeremiah Penery wrote: >>>>>> >>>>>>>On December 10, 2002 at 10:57:40, Robert Hyatt wrote: >>>>>>> >>>>>>>>On December 10, 2002 at 09:08:10, Vincent Diepeveen wrote: >>>>>>>> >>>>>>>>>Matt i don't know it for crafty or other crap products. Crafty as we >>>>>>>>>see in test needs less nodes when running MT=2, >>>>>>>> >>>>>>>>I realize this is hard for you to do, but is it _possible_ that you can stick >>>>>>>>to _real_ data when you post? The above is _absolute_ crap. Crafty does >>>>>>>>_not_ "need less nodes when MT=2". In some positions, yes, but in >>>>>>>>more positions it needs _more_. And for the average case it needs _more_. >>>>>>>> >>>>>>>>I don't know why you continue to post something that any person here can >>>>>>>>refute simply by running the code. I've done it for you many times. The >>>>>>>>above is false. Please find something _else_ to wave your hands about. >>>>>>> >>>>>>>It came from the original data in this thread: >>>>>> >>>>>>So? That is over 6 positions. Using that to prove that a program searches >>>>>>"fewer >>>>>>nodes with mt=2" is total nonsense, as is the claim that a program +will+ search >>>>>>fewer nodes overall using two threads. It simply doesn't happen. And it falls >>>>>>in >>>>>>the same class as the perpetual-motion machine... It doesn't work... >>>>> >>>>>I like Cold Fusion a little better. >>>> >>>>I'm not going that far. There is always a remote possibility that something >>>>like that >>>>might be possible given the right materials and conditions. Perpetual motion is >>>>another >>>>thing entirely, as is a speedup > 2.0 with two processors. :) >>> >>>Yeah. I like the Cold Fusion example because the data does not justify the >>>claim. But yeah, it is difficult to see how a second processor would possibly >>>create a speed-up of more than a factor of 2. Obviously if that (legitimately) >>>happens, more than just the number of CPUs has changed. >>> >>>>>>>Crafty v18.15 >>>>>>>White(1): bench >>>>>>>Running benchmark. . . >>>>>>>...... >>>>>>>Total nodes: 97487547 >>>>>>>Raw nodes per second: 1160566 >>>>>>>Total elapsed time: 84 >>>>>>>SMP time-to-ply measurement: 7.619048 >>>>>>>White(1): >>>>>>>------------------------------------- >>>>>>>Crafty v18.15 (2 cpus) >>>>>>>White(1): bench >>>>>>>Running benchmark. . . >>>>>>>...... >>>>>>>Total nodes: 94658095 >>>>>>>Raw nodes per second: 1314695 >>>>>>>Total elapsed time: 72 >>>>>>>SMP time-to-ply measurement: 8.888889 >>>>>>> >>>>>>> >>>>>>>>What is "a buggy crafty?" And what is the 13-16%? I posted _real_ data. You >>>>>>>>post fantasy without even having access to a box? And that is fact??? >>>>>>> >>>>>>>You can see also that the NPS speedup in that above data is 13%. >>>>>> >>>>>>For _one_ test... With a version of the program that has a _known_ problem with >>>>>>SMT. >>>>> >>>>>You mean the pause issue, or is there more than just that? >>>>> >>>>>-Matt >>>> >>>>Yes.... but not just in the Lock() code... there is a critical spin-wait that >>>>needs a pause >>>>otherwise one thread will be running in a spin-wait while the other thread is >>>>waiting >>>>to get scheduled and _it_ is the one that will give the "spinner" something to >>>>work on. :) >>> >>>Ah. I'm interested in seeing the results, but I'm not expecting a huge gain from >>>using pause. If one thread is beating on the lock, it leaves the majority of the >>>execution resources and bandwidth for the other logical thread. I don't think >>>that reducing the polling rate of the L1 cache will affect results much. >>> >>>I guess the only thing we can say right now is, "We will see!" >>> >>>-Matt >> >> >>Think about it for a minute. You have two processes to schedule. One is doing >>something >>useful, the other is busy spinning. So every chance the "spinner" gets, it >>executes full-speed >>ahead. And while it is executing, the _other_ thread is sitting. The CPU has a >>50% chance >>of choosing the _wrong_ thread when one is computing doing useful work and the >>other is >>spinning doing nothing but waiting on something to do... >> >>and that is what pause helps with, the "spinner" makes one pass thru the spin >>loop and >>then says "run the other thread now"... > >That's true for a scheduler on a single processor, but that's not how >Hyperthreading works as I understand it. Then again, it is possible that the >docs I read are wrong. (The last thing I read about HT was over 2 years ago.) > That is the way I have seen it described in various Intel white papers. In particular they refer to the cpu's "resource scheduler" and compare it to a multiprogramming operating system that is running two processes concurrently. >They said that HT allows -concurrent- scheduling of threads, but the threads >obviously cannot make use of the same execution resources. If this is correct, >one thread would be spinning (consuming bandwidth to the L1 cache) while the >other thread was doing real work. Again, think about what you just said, which is impossible to happen. If one thread is smoking the L1/L2 cache, then it is not waiting for _anything_ and once it is scheduled it will execute until the cpu decides to flip to the other thread. Or until that thread does a pause. Whichever comes first. > >For now I'm going to stick to what I have read. I'll poke around sometime later >this week and see if I can find any updated material on the inner workings of >HT. > >-Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.