Computer Chess Club Archives




Subject: Re: C and C++ --- NPS

Author: Matt Taylor

Date: 11:13:47 12/25/02

Go up one level in this thread

On December 25, 2002 at 04:17:34, Dave Gomboc wrote:

>On December 23, 2002 at 21:33:50, Matt Taylor wrote:
>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote:
>>>Is there anyone here who has actually converted their C code to C++ while
>>>keeping the NPS the same or even managed to improve upon it? Sune did mention
>>>that he did so only at the cost of 15% degradation in speed. Why do people even
>>>bother wasting their time with C++ for a chess program when it inevitably leads
>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the
>>>functions, I have yet to see a living example of chess program code which
>>>performed faster in C++.
>>The biggest hit you take in C++ is virtual functions, and inevitably people use
>>them. The virtual functions are very damaging due to the fact that they're
>>implemented with a function pointer, and at the machine level it translates into
>>an indirect call. The indirect calls branch mispredict because the CPU can't
>>figure out ahead-of-time where you're calling to, and branch mispredicts are
>Please excuse my ignorance, but where is the branch?  It's an indirect jump,
>isn't it?  There's nothing optional about it, it must be followed.  Is it that I
>am not familiar enough with the terminology?  e.g. I am thinking that by "branch
>prediction" you may be including any operation that moves the program counter in
>any way other than advancing sequentially to the next instruction.
>In any case, I'm confident that if you have a serious inheritance hierarchy (as
>opposed to a toy example) you're not going to lose any sleep over the virtual
>function call.  Many if-then-elses aren't efficient either.

A branch is anything that changes the flow of execution from the pc++ scheme. An
unconditional jump/call is still a branch.

The problem that superscalar processors have is that they're executing multiple
things simultaneously, and often to predict the target of a branch, you need to
have completed execution of instructions almost immediately prior.

Consider the example I gave in a previous post:
mov    esi, [ecx+SomeFuncPtr]
call   esi
call   esi

This is an indirect call. The machine loads the function pointer into the esi
register, and then the contents of that register are used as the target of the
unconditional branch. The problem is that the machine doesn't get around to
storing into esi until a cycle or so before the call, and the pipeline is some
20+ cycles deep (depending on processor). After it decodes the call instruction,
it needs to know where to go next, but it has to stall for those 20 cycles until
it knows what gets written to esi.

Static prediction assumes that it's not going to branch (in this case).
Obviously we know that it does, but rather than stall the pipeline for those 20+
cycles until they know WHERE it goes to, they choose simply to continue decode
at the wrong place. (If you make a small function call, this is actually
marginally advantageous over doing nothing since you've already got the return
address in the cache.)

In the case of an if/then/else, some implementations will work on BOTH branches
so you don't lose as much. Either way, the if/then/else can predict right, and
static prediction can NEVER get it right the first time.


This page took 0.04 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.