Author: Robert Hyatt
Date: 21:43:42 12/26/02
Go up one level in this thread
On December 26, 2002 at 23:49:53, Matt Taylor wrote: >On December 26, 2002 at 23:22:36, Robert Hyatt wrote: > >>On December 26, 2002 at 23:00:27, Matt Taylor wrote: >> >>>On December 26, 2002 at 00:52:11, Robert Hyatt wrote: >>> >>>>On December 25, 2002 at 14:13:47, Matt Taylor wrote: >>>> >>>>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote: >>>>> >>>>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote: >>>>>> >>>>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote: >>>>>>> >>>>>>>>Is there anyone here who has actually converted their C code to C++ while >>>>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention >>>>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even >>>>>>>>bother wasting their time with C++ for a chess program when it inevitably leads >>>>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the >>>>>>>>functions, I have yet to see a living example of chess program code which >>>>>>>>performed faster in C++. >>>>>>>> >>>>>>>>Regards, >>>>>>>>Arshad >>>>>>> >>>>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use >>>>>>>them. The virtual functions are very damaging due to the fact that they're >>>>>>>implemented with a function pointer, and at the machine level it translates into >>>>>>>an indirect call. The indirect calls branch mispredict because the CPU can't >>>>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are >>>>>>>expensive. >>>>>> >>>>>>Please excuse my ignorance, but where is the branch? It's an indirect jump, >>>>>>isn't it? There's nothing optional about it, it must be followed. Is it that I >>>>>>am not familiar enough with the terminology? e.g. I am thinking that by "branch >>>>>>prediction" you may be including any operation that moves the program counter in >>>>>>any way other than advancing sequentially to the next instruction. >>>>>> >>>>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as >>>>>>opposed to a toy example) you're not going to lose any sleep over the virtual >>>>>>function call. Many if-then-elses aren't efficient either. >>>>>> >>>>>>Dave >>>>> >>>>>A branch is anything that changes the flow of execution from the pc++ scheme. An >>>>>unconditional jump/call is still a branch. >>>>> >>>>>The problem that superscalar processors have is that they're executing multiple >>>>>things simultaneously, and often to predict the target of a branch, you need to >>>>>have completed execution of instructions almost immediately prior. >>>>> >>>>>Consider the example I gave in a previous post: >>>>>mov esi, [ecx+SomeFuncPtr] >>>>>call esi >>>>>call esi >>>>> >>>>>This is an indirect call. The machine loads the function pointer into the esi >>>>>register, and then the contents of that register are used as the target of the >>>>>unconditional branch. The problem is that the machine doesn't get around to >>>>>storing into esi until a cycle or so before the call, and the pipeline is some >>>>>20+ cycles deep (depending on processor). After it decodes the call instruction, >>>>>it needs to know where to go next, but it has to stall for those 20 cycles until >>>>>it knows what gets written to esi. >>>>> >>>>>Static prediction assumes that it's not going to branch (in this case). >>>>>Obviously we know that it does, but rather than stall the pipeline for those 20+ >>>>>cycles until they know WHERE it goes to, they choose simply to continue decode >>>>>at the wrong place. (If you make a small function call, this is actually >>>>>marginally advantageous over doing nothing since you've already got the return >>>>>address in the cache.) >>>>> >>>> >>>>I'm not quite following here. So far as I understand the processor, there is >>>>_no_ branch prediction for an unconditional branch, and it can _never_ be >>>>wrong. Forward conditional branches are assumed not taken if they don't appear >>>>in the BTB, backward branches are assumed to be taken if they are not in the >>>>BTB, otherwise the BTB dictates. But for _unconditional branches_ I don't >>>>believe _any_ of this applies. >>> >>>Yes and no. There are two types of unconditional branches: direct and indirect. >>>The above rules are true for static prediction (when no BTB entry exists) except >>>for indirect branches. >>> >>>The Intel Xeon optimization manual specifies that indirect branches are >>>predicted as not taken (section 2-15), presumably because it is difficult to >>>predict. I have read similar things about the Athlon, that it can't predict >>>indirect branches, but I cannot find the reference. >> >>OK. this sounds like a _bad_ fix. IE intentionally predicting that the >>branch won't be taken, knowing it will. ANd probably using a bogus branch >>address as well since it is unknown at the time. >> >>The problem is that this will certainly interfere with SMT, which is a shame >>and a waste, if that is what they are doing. > >"Not taken" means execution contiues with the pc++ scheme (though we all know >that x86 instructions are not uniformly sized). It isn't necessarily a complete >waste. If the function called is short, you return with the next set of code in >the trace cache and ready to execute -- better than doing nothing when the >processor doesn't do SMT. > >It would be interesting for SMT to see them fix branch issues by running other >threads while it predicted. I laughed when I read about Sparc because their >branch fix was so simple; this would be a very elegant solution to the branch >problem with 100% prediction accuracy and higher throughput. However, you would >hurt performance on single-threaded applications... > >-Matt Are you talking about the branch delay-slot on the sparc, or something else? Note it isn't a full fix, it just solves the problem of fetching in stage 1, decoding in stage 2, and the instruction has to get to stage 2 before you know it is a branch and by then you have fetched PC+1...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.