Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: C and C++ --- NPS

Author: Matt Taylor

Date: 20:49:53 12/26/02

Go up one level in this thread


On December 26, 2002 at 23:22:36, Robert Hyatt wrote:

>On December 26, 2002 at 23:00:27, Matt Taylor wrote:
>
>>On December 26, 2002 at 00:52:11, Robert Hyatt wrote:
>>
>>>On December 25, 2002 at 14:13:47, Matt Taylor wrote:
>>>
>>>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote:
>>>>
>>>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote:
>>>>>
>>>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote:
>>>>>>
>>>>>>>Is there anyone here who has actually converted their C code to C++ while
>>>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention
>>>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even
>>>>>>>bother wasting their time with C++ for a chess program when it inevitably leads
>>>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the
>>>>>>>functions, I have yet to see a living example of chess program code which
>>>>>>>performed faster in C++.
>>>>>>>
>>>>>>>Regards,
>>>>>>>Arshad
>>>>>>
>>>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use
>>>>>>them. The virtual functions are very damaging due to the fact that they're
>>>>>>implemented with a function pointer, and at the machine level it translates into
>>>>>>an indirect call. The indirect calls branch mispredict because the CPU can't
>>>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are
>>>>>>expensive.
>>>>>
>>>>>Please excuse my ignorance, but where is the branch?  It's an indirect jump,
>>>>>isn't it?  There's nothing optional about it, it must be followed.  Is it that I
>>>>>am not familiar enough with the terminology?  e.g. I am thinking that by "branch
>>>>>prediction" you may be including any operation that moves the program counter in
>>>>>any way other than advancing sequentially to the next instruction.
>>>>>
>>>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as
>>>>>opposed to a toy example) you're not going to lose any sleep over the virtual
>>>>>function call.  Many if-then-elses aren't efficient either.
>>>>>
>>>>>Dave
>>>>
>>>>A branch is anything that changes the flow of execution from the pc++ scheme. An
>>>>unconditional jump/call is still a branch.
>>>>
>>>>The problem that superscalar processors have is that they're executing multiple
>>>>things simultaneously, and often to predict the target of a branch, you need to
>>>>have completed execution of instructions almost immediately prior.
>>>>
>>>>Consider the example I gave in a previous post:
>>>>mov    esi, [ecx+SomeFuncPtr]
>>>>call   esi
>>>>call   esi
>>>>
>>>>This is an indirect call. The machine loads the function pointer into the esi
>>>>register, and then the contents of that register are used as the target of the
>>>>unconditional branch. The problem is that the machine doesn't get around to
>>>>storing into esi until a cycle or so before the call, and the pipeline is some
>>>>20+ cycles deep (depending on processor). After it decodes the call instruction,
>>>>it needs to know where to go next, but it has to stall for those 20 cycles until
>>>>it knows what gets written to esi.
>>>>
>>>>Static prediction assumes that it's not going to branch (in this case).
>>>>Obviously we know that it does, but rather than stall the pipeline for those 20+
>>>>cycles until they know WHERE it goes to, they choose simply to continue decode
>>>>at the wrong place. (If you make a small function call, this is actually
>>>>marginally advantageous over doing nothing since you've already got the return
>>>>address in the cache.)
>>>>
>>>
>>>I'm not quite following here.  So far as I understand the processor, there is
>>>_no_ branch prediction for an unconditional branch, and it can _never_ be
>>>wrong.  Forward conditional branches are assumed not taken if they don't appear
>>>in the BTB, backward branches are assumed to be taken if they are not in the
>>>BTB, otherwise the BTB dictates.  But for _unconditional branches_ I don't
>>>believe _any_ of this applies.
>>
>>Yes and no. There are two types of unconditional branches: direct and indirect.
>>The above rules are true for static prediction (when no BTB entry exists) except
>>for indirect branches.
>>
>>The Intel Xeon optimization manual specifies that indirect branches are
>>predicted as not taken (section 2-15), presumably because it is difficult to
>>predict. I have read similar things about the Athlon, that it can't predict
>>indirect branches, but I cannot find the reference.
>
>OK.  this sounds like a _bad_ fix.  IE intentionally predicting that the
>branch won't be taken, knowing it will.  ANd probably using a bogus branch
>address as well since it is unknown at the time.
>
>The problem is that this will certainly interfere with SMT, which is a shame
>and a waste, if that is what they are doing.

"Not taken" means execution contiues with the pc++ scheme (though we all know
that x86 instructions are not uniformly sized). It isn't necessarily a complete
waste. If the function called is short, you return with the next set of code in
the trace cache and ready to execute -- better than doing nothing when the
processor doesn't do SMT.

It would be interesting for SMT to see them fix branch issues by running other
threads while it predicted. I laughed when I read about Sparc because their
branch fix was so simple; this would be a very elegant solution to the branch
problem with 100% prediction accuracy and higher throughput. However, you would
hurt performance on single-threaded applications...

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.