Author: Robert Hyatt
Date: 20:22:36 12/26/02
Go up one level in this thread
On December 26, 2002 at 23:00:27, Matt Taylor wrote: >On December 26, 2002 at 00:52:11, Robert Hyatt wrote: > >>On December 25, 2002 at 14:13:47, Matt Taylor wrote: >> >>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote: >>> >>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote: >>>> >>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote: >>>>> >>>>>>Is there anyone here who has actually converted their C code to C++ while >>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention >>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even >>>>>>bother wasting their time with C++ for a chess program when it inevitably leads >>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the >>>>>>functions, I have yet to see a living example of chess program code which >>>>>>performed faster in C++. >>>>>> >>>>>>Regards, >>>>>>Arshad >>>>> >>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use >>>>>them. The virtual functions are very damaging due to the fact that they're >>>>>implemented with a function pointer, and at the machine level it translates into >>>>>an indirect call. The indirect calls branch mispredict because the CPU can't >>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are >>>>>expensive. >>>> >>>>Please excuse my ignorance, but where is the branch? It's an indirect jump, >>>>isn't it? There's nothing optional about it, it must be followed. Is it that I >>>>am not familiar enough with the terminology? e.g. I am thinking that by "branch >>>>prediction" you may be including any operation that moves the program counter in >>>>any way other than advancing sequentially to the next instruction. >>>> >>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as >>>>opposed to a toy example) you're not going to lose any sleep over the virtual >>>>function call. Many if-then-elses aren't efficient either. >>>> >>>>Dave >>> >>>A branch is anything that changes the flow of execution from the pc++ scheme. An >>>unconditional jump/call is still a branch. >>> >>>The problem that superscalar processors have is that they're executing multiple >>>things simultaneously, and often to predict the target of a branch, you need to >>>have completed execution of instructions almost immediately prior. >>> >>>Consider the example I gave in a previous post: >>>mov esi, [ecx+SomeFuncPtr] >>>call esi >>>call esi >>> >>>This is an indirect call. The machine loads the function pointer into the esi >>>register, and then the contents of that register are used as the target of the >>>unconditional branch. The problem is that the machine doesn't get around to >>>storing into esi until a cycle or so before the call, and the pipeline is some >>>20+ cycles deep (depending on processor). After it decodes the call instruction, >>>it needs to know where to go next, but it has to stall for those 20 cycles until >>>it knows what gets written to esi. >>> >>>Static prediction assumes that it's not going to branch (in this case). >>>Obviously we know that it does, but rather than stall the pipeline for those 20+ >>>cycles until they know WHERE it goes to, they choose simply to continue decode >>>at the wrong place. (If you make a small function call, this is actually >>>marginally advantageous over doing nothing since you've already got the return >>>address in the cache.) >>> >> >>I'm not quite following here. So far as I understand the processor, there is >>_no_ branch prediction for an unconditional branch, and it can _never_ be >>wrong. Forward conditional branches are assumed not taken if they don't appear >>in the BTB, backward branches are assumed to be taken if they are not in the >>BTB, otherwise the BTB dictates. But for _unconditional branches_ I don't >>believe _any_ of this applies. > >Yes and no. There are two types of unconditional branches: direct and indirect. >The above rules are true for static prediction (when no BTB entry exists) except >for indirect branches. > >The Intel Xeon optimization manual specifies that indirect branches are >predicted as not taken (section 2-15), presumably because it is difficult to >predict. I have read similar things about the Athlon, that it can't predict >indirect branches, but I cannot find the reference. OK. this sounds like a _bad_ fix. IE intentionally predicting that the branch won't be taken, knowing it will. ANd probably using a bogus branch address as well since it is unknown at the time. The problem is that this will certainly interfere with SMT, which is a shame and a waste, if that is what they are doing. > >The problem here lies in the question, "How many indirect branches have the >target address over X cycles ahead of time where X is the latency of the >pipeline?" You have to also remember that function calls complicate things since >they destroy contents of certain registers. Other "preserved" registers don't >necessarily remain unchanged -- they simply get their value back at the end of >the function. > >>>In the case of an if/then/else, some implementations will work on BOTH branches >>>so you don't lose as much. Either way, the if/then/else can predict right, and >>>static prediction can NEVER get it right the first time. >>> >>>-Matt >> >>Actually they can be right _most_ of time time if you write the code correctly. >>If there is a 50% probability, no. But if one path is more common, then that >>path should be predicted almost all the time, which will make it right most of >>the time also... > >Yes, with code written well. I believe I said the same thing to Vincent >recently. :-) > >-Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.