Author: Matt Taylor
Date: 20:49:53 12/26/02
Go up one level in this thread
On December 26, 2002 at 23:22:36, Robert Hyatt wrote: >On December 26, 2002 at 23:00:27, Matt Taylor wrote: > >>On December 26, 2002 at 00:52:11, Robert Hyatt wrote: >> >>>On December 25, 2002 at 14:13:47, Matt Taylor wrote: >>> >>>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote: >>>> >>>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote: >>>>> >>>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote: >>>>>> >>>>>>>Is there anyone here who has actually converted their C code to C++ while >>>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention >>>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even >>>>>>>bother wasting their time with C++ for a chess program when it inevitably leads >>>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the >>>>>>>functions, I have yet to see a living example of chess program code which >>>>>>>performed faster in C++. >>>>>>> >>>>>>>Regards, >>>>>>>Arshad >>>>>> >>>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use >>>>>>them. The virtual functions are very damaging due to the fact that they're >>>>>>implemented with a function pointer, and at the machine level it translates into >>>>>>an indirect call. The indirect calls branch mispredict because the CPU can't >>>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are >>>>>>expensive. >>>>> >>>>>Please excuse my ignorance, but where is the branch? It's an indirect jump, >>>>>isn't it? There's nothing optional about it, it must be followed. Is it that I >>>>>am not familiar enough with the terminology? e.g. I am thinking that by "branch >>>>>prediction" you may be including any operation that moves the program counter in >>>>>any way other than advancing sequentially to the next instruction. >>>>> >>>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as >>>>>opposed to a toy example) you're not going to lose any sleep over the virtual >>>>>function call. Many if-then-elses aren't efficient either. >>>>> >>>>>Dave >>>> >>>>A branch is anything that changes the flow of execution from the pc++ scheme. An >>>>unconditional jump/call is still a branch. >>>> >>>>The problem that superscalar processors have is that they're executing multiple >>>>things simultaneously, and often to predict the target of a branch, you need to >>>>have completed execution of instructions almost immediately prior. >>>> >>>>Consider the example I gave in a previous post: >>>>mov esi, [ecx+SomeFuncPtr] >>>>call esi >>>>call esi >>>> >>>>This is an indirect call. The machine loads the function pointer into the esi >>>>register, and then the contents of that register are used as the target of the >>>>unconditional branch. The problem is that the machine doesn't get around to >>>>storing into esi until a cycle or so before the call, and the pipeline is some >>>>20+ cycles deep (depending on processor). After it decodes the call instruction, >>>>it needs to know where to go next, but it has to stall for those 20 cycles until >>>>it knows what gets written to esi. >>>> >>>>Static prediction assumes that it's not going to branch (in this case). >>>>Obviously we know that it does, but rather than stall the pipeline for those 20+ >>>>cycles until they know WHERE it goes to, they choose simply to continue decode >>>>at the wrong place. (If you make a small function call, this is actually >>>>marginally advantageous over doing nothing since you've already got the return >>>>address in the cache.) >>>> >>> >>>I'm not quite following here. So far as I understand the processor, there is >>>_no_ branch prediction for an unconditional branch, and it can _never_ be >>>wrong. Forward conditional branches are assumed not taken if they don't appear >>>in the BTB, backward branches are assumed to be taken if they are not in the >>>BTB, otherwise the BTB dictates. But for _unconditional branches_ I don't >>>believe _any_ of this applies. >> >>Yes and no. There are two types of unconditional branches: direct and indirect. >>The above rules are true for static prediction (when no BTB entry exists) except >>for indirect branches. >> >>The Intel Xeon optimization manual specifies that indirect branches are >>predicted as not taken (section 2-15), presumably because it is difficult to >>predict. I have read similar things about the Athlon, that it can't predict >>indirect branches, but I cannot find the reference. > >OK. this sounds like a _bad_ fix. IE intentionally predicting that the >branch won't be taken, knowing it will. ANd probably using a bogus branch >address as well since it is unknown at the time. > >The problem is that this will certainly interfere with SMT, which is a shame >and a waste, if that is what they are doing. "Not taken" means execution contiues with the pc++ scheme (though we all know that x86 instructions are not uniformly sized). It isn't necessarily a complete waste. If the function called is short, you return with the next set of code in the trace cache and ready to execute -- better than doing nothing when the processor doesn't do SMT. It would be interesting for SMT to see them fix branch issues by running other threads while it predicted. I laughed when I read about Sparc because their branch fix was so simple; this would be a very elegant solution to the branch problem with 100% prediction accuracy and higher throughput. However, you would hurt performance on single-threaded applications... -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.