Computer Chess Club Archives




Subject: Re: C and C++ --- NPS

Author: Robert Hyatt

Date: 21:43:42 12/26/02

Go up one level in this thread

On December 26, 2002 at 23:49:53, Matt Taylor wrote:

>On December 26, 2002 at 23:22:36, Robert Hyatt wrote:
>>On December 26, 2002 at 23:00:27, Matt Taylor wrote:
>>>On December 26, 2002 at 00:52:11, Robert Hyatt wrote:
>>>>On December 25, 2002 at 14:13:47, Matt Taylor wrote:
>>>>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote:
>>>>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote:
>>>>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote:
>>>>>>>>Is there anyone here who has actually converted their C code to C++ while
>>>>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention
>>>>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even
>>>>>>>>bother wasting their time with C++ for a chess program when it inevitably leads
>>>>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the
>>>>>>>>functions, I have yet to see a living example of chess program code which
>>>>>>>>performed faster in C++.
>>>>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use
>>>>>>>them. The virtual functions are very damaging due to the fact that they're
>>>>>>>implemented with a function pointer, and at the machine level it translates into
>>>>>>>an indirect call. The indirect calls branch mispredict because the CPU can't
>>>>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are
>>>>>>Please excuse my ignorance, but where is the branch?  It's an indirect jump,
>>>>>>isn't it?  There's nothing optional about it, it must be followed.  Is it that I
>>>>>>am not familiar enough with the terminology?  e.g. I am thinking that by "branch
>>>>>>prediction" you may be including any operation that moves the program counter in
>>>>>>any way other than advancing sequentially to the next instruction.
>>>>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as
>>>>>>opposed to a toy example) you're not going to lose any sleep over the virtual
>>>>>>function call.  Many if-then-elses aren't efficient either.
>>>>>A branch is anything that changes the flow of execution from the pc++ scheme. An
>>>>>unconditional jump/call is still a branch.
>>>>>The problem that superscalar processors have is that they're executing multiple
>>>>>things simultaneously, and often to predict the target of a branch, you need to
>>>>>have completed execution of instructions almost immediately prior.
>>>>>Consider the example I gave in a previous post:
>>>>>mov    esi, [ecx+SomeFuncPtr]
>>>>>call   esi
>>>>>call   esi
>>>>>This is an indirect call. The machine loads the function pointer into the esi
>>>>>register, and then the contents of that register are used as the target of the
>>>>>unconditional branch. The problem is that the machine doesn't get around to
>>>>>storing into esi until a cycle or so before the call, and the pipeline is some
>>>>>20+ cycles deep (depending on processor). After it decodes the call instruction,
>>>>>it needs to know where to go next, but it has to stall for those 20 cycles until
>>>>>it knows what gets written to esi.
>>>>>Static prediction assumes that it's not going to branch (in this case).
>>>>>Obviously we know that it does, but rather than stall the pipeline for those 20+
>>>>>cycles until they know WHERE it goes to, they choose simply to continue decode
>>>>>at the wrong place. (If you make a small function call, this is actually
>>>>>marginally advantageous over doing nothing since you've already got the return
>>>>>address in the cache.)
>>>>I'm not quite following here.  So far as I understand the processor, there is
>>>>_no_ branch prediction for an unconditional branch, and it can _never_ be
>>>>wrong.  Forward conditional branches are assumed not taken if they don't appear
>>>>in the BTB, backward branches are assumed to be taken if they are not in the
>>>>BTB, otherwise the BTB dictates.  But for _unconditional branches_ I don't
>>>>believe _any_ of this applies.
>>>Yes and no. There are two types of unconditional branches: direct and indirect.
>>>The above rules are true for static prediction (when no BTB entry exists) except
>>>for indirect branches.
>>>The Intel Xeon optimization manual specifies that indirect branches are
>>>predicted as not taken (section 2-15), presumably because it is difficult to
>>>predict. I have read similar things about the Athlon, that it can't predict
>>>indirect branches, but I cannot find the reference.
>>OK.  this sounds like a _bad_ fix.  IE intentionally predicting that the
>>branch won't be taken, knowing it will.  ANd probably using a bogus branch
>>address as well since it is unknown at the time.
>>The problem is that this will certainly interfere with SMT, which is a shame
>>and a waste, if that is what they are doing.
>"Not taken" means execution contiues with the pc++ scheme (though we all know
>that x86 instructions are not uniformly sized). It isn't necessarily a complete
>waste. If the function called is short, you return with the next set of code in
>the trace cache and ready to execute -- better than doing nothing when the
>processor doesn't do SMT.
>It would be interesting for SMT to see them fix branch issues by running other
>threads while it predicted. I laughed when I read about Sparc because their
>branch fix was so simple; this would be a very elegant solution to the branch
>problem with 100% prediction accuracy and higher throughput. However, you would
>hurt performance on single-threaded applications...

Are you talking about the branch delay-slot on the sparc, or something

Note it isn't a full fix, it just solves the problem of fetching in stage
1, decoding in stage 2, and the instruction has to get to stage 2 before
you know it is a branch and by then you have fetched PC+1...

This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.