Author: Robert Hyatt
Date: 21:01:34 04/15/03
Go up one level in this thread
On April 15, 2003 at 14:56:38, Tom Kerrigan wrote: >On April 14, 2003 at 23:41:10, Robert Hyatt wrote: > >>On April 14, 2003 at 18:19:28, Ricardo Gibert wrote: >> >>>On April 14, 2003 at 17:54:14, Robert Hyatt wrote: >>> >>>>On April 14, 2003 at 15:50:22, Tom Kerrigan wrote: >>>> >>>>>On April 13, 2003 at 11:21:51, Robert Hyatt wrote: >>>>> >>>>>>On April 13, 2003 at 02:37:57, Tom Kerrigan wrote: >>>>>> >>>>>>>On April 13, 2003 at 01:04:52, Robert Hyatt wrote: >>>>>>> >>>>>>>>It _is_ pinned on SMT. The two logical processors are producing wildly >>>>>>>>imbalanced results when using threads, vs using two separate processes. It >>>>>>>>would appear to be cache-related... >>>>>>> >>>>>>>This is some sort of joke, right? You and Vincent see the same behavior, you >>>>>>>have SMT and Vincent doesn't, and somehow the problem is with SMT? >>>>>>. >>>>>> >>>>>> >>>>>>The _variability_ is with SMT. What are you talking about? I reported _two_ >>>>>>issues. >>>>>> >>>>>>1. My dual xeon runs two copies of crafty about 2x as fast as if they were >>>>>>run one after the other. So does my quad 700. >>>>>> >>>>>>2. My dual xeon runs one copy, two threads, at about 1.5X the speed that it >>>>>>should. >>>>>> >>>>>>That is a problem. >>>>>> >>>>>>The second issue is that my dual xeon does _not_ run threaded crafty in a >>>>>>balanced way on two logical processors. For two independent copies, it >>>>>>varies from 50-50 to 45-55. Not unreasonable. But for the single threaded >>>>>>copy, it varies all the way to 70-30. _that_ is an SMT issue. Probably, as >>>>>>I mentioned, caused by some unknown L2 cache issue. But it _is_ a problem >>>>>>with SMT if you want to assume that normally it is about 50-50 roughly, for >>>>>>_regular_ applications. >>>>>> >>>>>>shared memory, locks, etc are causing something strange to happen. >>>>> >>>>>It looks like you're having enough problems and unexplained behavior already >>>>>that it's hard to trust any sort of numbers you post. But still, if the widest >>>>>disparity you measured was 70-30, that seems like enough to dispel your notion >>>>>that one thread always gets priority over the other. >>>>> >>>>>-Tom >>>> >>>> >>>>How? >>>> >>>>70-30 is > 2:1. >>>> >>>>Something is going on. >>> >>> >>>If the worst you could do by flipping a coin 1 million times is to get heads 70% >>>of the time, one should conclude the coin is unbiased? I don't think so. You're >>>right to think 70-30 is a significant result. There is some asymmetry (a bug?) >>>going on where none is expected. >> >>My original idea was that somewhere along the way, you _must_ make a decision >>about which of two things to do next. Flipping a bit complicates the process >>if it has to be done many times. The old Cray did it by using the processor >>ID to break ties. It is possible that somewhere in the PIV core, there is >>a tie-break that is not 50-50. It is also possible that the results I am >>getting are somehow wrong... > >"Tie breaking" is not the issue. If you read the thing Anthony posted, you'd >know that all the P4's resources are evenly divided: instruction window, reorder >registers, reorder buffer, load/store queues, everything. How can you have >unbalanced execution when each thread gets half of everything? I don't think you >could if you wanted. > >When I suggested that you and Vincent had the same problem, with unbalanced >processing, you bitched that you had really reported two problems. Well, that's >not so clear to me. If your dual proc machine is searching 1.5x the NPS that it >should, is it unreasonable to think that maybe one of the processors is somehow >idle (spinning) half the time? Because if that's the case, running both threads >on a HT processor could easily result in a 66-33 disparity in NPS per thread, >which is pretty damn close to what you're seeing. > >Occam's razor, Bob. > >-Tom But the principle doesn't apply here. Why? because I _know_ that one thread is not "spinning" or "waiting" whatsoever. I _carefully_ account for every spin while waiting on work. It averages about 5% out of 400% on a 3 minute search. There is more going on than meets the eye, from several perspectives.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.