Author: Robert Hyatt
Date: 09:13:35 04/16/03
Go up one level in this thread
On April 16, 2003 at 01:37:08, Tom Kerrigan wrote: >On April 16, 2003 at 00:01:34, Robert Hyatt wrote: > >>On April 15, 2003 at 14:56:38, Tom Kerrigan wrote: >> >>>On April 14, 2003 at 23:41:10, Robert Hyatt wrote: >>> >>>>On April 14, 2003 at 18:19:28, Ricardo Gibert wrote: >>>> >>>>>On April 14, 2003 at 17:54:14, Robert Hyatt wrote: >>>>> >>>>>>On April 14, 2003 at 15:50:22, Tom Kerrigan wrote: >>>>>> >>>>>>>On April 13, 2003 at 11:21:51, Robert Hyatt wrote: >>>>>>> >>>>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote: >>>>>>>> >>>>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote: >>>>>>>>> >>>>>>>>>>It _is_ pinned on SMT. The two logical processors are producing wildly >>>>>>>>>>imbalanced results when using threads, vs using two separate processes. It >>>>>>>>>>would appear to be cache-related... >>>>>>>>> >>>>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you >>>>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT? >>>>>>>>. >>>>>>>> >>>>>>>> >>>>>>>>The _variability_ is with SMT. What are you talking about? I reported _two_ >>>>>>>>issues. >>>>>>>> >>>>>>>>1. My dual xeon runs two copies of crafty about 2x as fast as if they were >>>>>>>>run one after the other. So does my quad 700. >>>>>>>> >>>>>>>>2. My dual xeon runs one copy, two threads, at about 1.5X the speed that it >>>>>>>>should. >>>>>>>> >>>>>>>>That is a problem. >>>>>>>> >>>>>>>>The second issue is that my dual xeon does _not_ run threaded crafty in a >>>>>>>>balanced way on two logical processors. For two independent copies, it >>>>>>>>varies from 50-50 to 45-55. Not unreasonable. But for the single threaded >>>>>>>>copy, it varies all the way to 70-30. _that_ is an SMT issue. Probably, as >>>>>>>>I mentioned, caused by some unknown L2 cache issue. But it _is_ a problem >>>>>>>>with SMT if you want to assume that normally it is about 50-50 roughly, for >>>>>>>>_regular_ applications. >>>>>>>> >>>>>>>>shared memory, locks, etc are causing something strange to happen. >>>>>>> >>>>>>>It looks like you're having enough problems and unexplained behavior already >>>>>>>that it's hard to trust any sort of numbers you post. But still, if the widest >>>>>>>disparity you measured was 70-30, that seems like enough to dispel your notion >>>>>>>that one thread always gets priority over the other. >>>>>>> >>>>>>>-Tom >>>>>> >>>>>> >>>>>>How? >>>>>> >>>>>>70-30 is > 2:1. >>>>>> >>>>>>Something is going on. >>>>> >>>>> >>>>>If the worst you could do by flipping a coin 1 million times is to get heads 70% >>>>>of the time, one should conclude the coin is unbiased? I don't think so. You're >>>>>right to think 70-30 is a significant result. There is some asymmetry (a bug?) >>>>>going on where none is expected. >>>> >>>>My original idea was that somewhere along the way, you _must_ make a decision >>>>about which of two things to do next. Flipping a bit complicates the process >>>>if it has to be done many times. The old Cray did it by using the processor >>>>ID to break ties. It is possible that somewhere in the PIV core, there is >>>>a tie-break that is not 50-50. It is also possible that the results I am >>>>getting are somehow wrong... >>> >>>"Tie breaking" is not the issue. If you read the thing Anthony posted, you'd >>>know that all the P4's resources are evenly divided: instruction window, reorder >>>registers, reorder buffer, load/store queues, everything. How can you have >>>unbalanced execution when each thread gets half of everything? I don't think you >>>could if you wanted. >>> >>>When I suggested that you and Vincent had the same problem, with unbalanced >>>processing, you bitched that you had really reported two problems. Well, that's >>>not so clear to me. If your dual proc machine is searching 1.5x the NPS that it >>>should, is it unreasonable to think that maybe one of the processors is somehow >>>idle (spinning) half the time? Because if that's the case, running both threads >>>on a HT processor could easily result in a 66-33 disparity in NPS per thread, >>>which is pretty damn close to what you're seeing. >>> >>>Occam's razor, Bob. >>> >>>-Tom >> >> >>But the principle doesn't apply here. Why? because I _know_ that one thread >>is not "spinning" or "waiting" whatsoever. I _carefully_ account for every >>spin while waiting on work. It averages about 5% out of 400% on a 3 minute >>search. >> >>There is more going on than meets the eye, from several perspectives. > >Sure. What's more likely? > >1. Your code that accounts for spins is not working right for some reason. Probability zero. Because it works on _every_ other machine here. >2. Your splitting algorithm sends all the slow nodes to the 2nd thread. I don't see any "slow" nodes so I don't know what that means... >3. Your 2nd processor underclocks itself by 1/2 when it detects that it's >running a Crafty thread. Or cosmic rays... >4. You really are searching those nodes, but due to a bug from overclocking, the >ALU doesn't increment the node counter. >5. > Or, most likely, by a 100:1 probability is "none of the above." >-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.