Author: Robert Hyatt
Date: 09:07:23 02/20/03
Go up one level in this thread
On February 20, 2003 at 01:22:48, Matt Taylor wrote: >On February 19, 2003 at 20:36:06, Robert Hyatt wrote: > >>On February 19, 2003 at 18:22:35, Tom Kerrigan wrote: >> >>>On February 19, 2003 at 00:29:06, Robert Hyatt wrote: >>> >>>>On February 18, 2003 at 13:33:12, Tom Kerrigan wrote: >>>> >>>>>On February 16, 2003 at 03:03:03, Matt Taylor wrote: >>>>> >>>>>>On February 15, 2003 at 21:28:39, Tom Kerrigan wrote: >>>>>> >>>>>>>On February 13, 2003 at 19:40:45, Matt Taylor wrote: >>>>>>> >>>>>>>>You're not getting it. Logic on the processor for static branch prediction is >>>>>>>>80% accurate because auxillary information available to the compiler is thrown >>>>>>>>out. Consider the following loop: >>>>>>>>for(i = 0; i < 1000; i++) >>>>>>>> do_something(); >>>>>>> >>>>>>>You're the one who's not getting it if you think processors have logic for >>>>>>>static branch prediction (hint: processors do dynamic prediction) or if you >>>>>>>think these are the kinds of branches that matter for execution or compilation. >>>>>>>(Any branch prediction scheme would predict your branch with 99.9% accuracy.) >>>>>> >>>>>>Doesn't matter what you call it. AMD seems to think my Athlon has static branch >>>>>>prediction. I'm not sure why you disagree. >>>>> >>>>>I'm not sure why. Static prediction is defined as before runtime, at least for >>>>>all the definitions I've seen. Maybe they're referring to data that the Athlon >>>>>can collect for use with static predictors, i.e., profile directed compiling. >>>> >>>> >>>>Static can also mean "based on previous results". IE a killer move is a >>>>static move ordering idea, it is based on history, not on what is happening >>>>_right now_... I don't know that that is what they mean, of course. >>> >>>Obviously not. Look up any dynamic branch prediction scheme and you will see >>>that it's based on the history of whether or not the branch has been taken. >>>Dynamic, in the case of branch prediction, means that it's done while the >>>program is running (moving, so to speak), i.e., what processors do. >>> >>>-Tom >> >> >>Then the only other explanation for "static" I can think of is what happens when >>the BTB doesn't have the target address in it. If the branch goes backward, it >>is assumed taken (a loop) while if it goes forward, it is assumed not taken. >> >>However, while I can't speak for AMD, Intel does it both ways. The BTB is the >>preferred way (and the PIV uses a really neat approach to recognize patterns of >>branches being taken and not taken) but when that fails, the "old school" way >>of branch direction is used.. > >Yes, this is the way I used the term. I have seen "static prediction" referred >to as the initial guess when there is no BTB entry. Perhaps the term is >incorrect; perhaps it is. > >I read a year or so ago about the Pentium's (as in P5) branch prediction scheme. >It entered either in weakly taken or weakly not taken; I don't recall which. All >Intel processors since then predict based on whether the branch jumps forward or >backward. I don't know anything about the K5, but I believe the K6 did this as >well. I know the Athlon does. > >The pattern recognition scheme has been around for a while. I don't know how the >Pentium 4 does it, but the P6 core was able to recognize patterns shorter than >16 elements. Athlon is very similar but supposedly slightly better. I'd imagine >that the Pentium 4 is yet another variant. > >Just for trivia, the original Pentium used the basic 2-bit counter scheme. Due >to a bug, the processor would reset the counter from state 3 (strongly taken) to >state 0 (strongly not taken) rather than moving to state 2. There was a very >interesting article regarding it on x86.org. > >-Matt I don't remember where the original "pattern" prediction paper was published, but I have it in my files somewhere. And this is supposedly what Intel did in the PIV branch prediction. I have not seen them claim this specifically, but the idea from the paper is this: a single BTB entry is 64 bits, which is enough for 32 of the 4-state finite state machines with weakly taken, strongly taken, weakly not-taken and strongly not-taken. Those work as always, except there are 32 of them. There is a 5-bit mask indicating whether the previous five branches to this target were taken (1) or not-taken (0). This mask indexes to the right 2-bit FSM for that particular pattern. That FSM is used to predict the branch, which means that if any pattern is 5 or less, no matter what combo of taken and not-taken, it will end up predicted 100%. However, for the caveat. I seem to recall in a simple conversation with someone at Intel, that they didn't _quite_ get all of this to work, becauase the 5 bit signature took room in the BTB. He mumbled something about we can catch patterns 4 or 5 long depending, which leads me to believe that maybe the 5-bit signature is in the BTB, which would displace the last three FSM bit pairs. I'll see if I can find some real details, as most of that was in the paper describing this and Intel had published that they were using this in the PIV. So don't hold me to it, although I think everyone would agree that the pattern idea is pretty cute, and not hard to deal with in a hardware implementation (very little additional complexity over the old 4-state FSM approach everyone has been using). The only downside is that since a BTB grows significantly in size, it has significantly fewer total entries... because each entry is now more than 2 bits of FSM plus the target address to match on.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.