Computer Chess Club Archives




Subject: Re: C and C++ --- NPS

Author: Robert Hyatt

Date: 20:22:36 12/26/02

Go up one level in this thread

On December 26, 2002 at 23:00:27, Matt Taylor wrote:

>On December 26, 2002 at 00:52:11, Robert Hyatt wrote:
>>On December 25, 2002 at 14:13:47, Matt Taylor wrote:
>>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote:
>>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote:
>>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote:
>>>>>>Is there anyone here who has actually converted their C code to C++ while
>>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention
>>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even
>>>>>>bother wasting their time with C++ for a chess program when it inevitably leads
>>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the
>>>>>>functions, I have yet to see a living example of chess program code which
>>>>>>performed faster in C++.
>>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use
>>>>>them. The virtual functions are very damaging due to the fact that they're
>>>>>implemented with a function pointer, and at the machine level it translates into
>>>>>an indirect call. The indirect calls branch mispredict because the CPU can't
>>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are
>>>>Please excuse my ignorance, but where is the branch?  It's an indirect jump,
>>>>isn't it?  There's nothing optional about it, it must be followed.  Is it that I
>>>>am not familiar enough with the terminology?  e.g. I am thinking that by "branch
>>>>prediction" you may be including any operation that moves the program counter in
>>>>any way other than advancing sequentially to the next instruction.
>>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as
>>>>opposed to a toy example) you're not going to lose any sleep over the virtual
>>>>function call.  Many if-then-elses aren't efficient either.
>>>A branch is anything that changes the flow of execution from the pc++ scheme. An
>>>unconditional jump/call is still a branch.
>>>The problem that superscalar processors have is that they're executing multiple
>>>things simultaneously, and often to predict the target of a branch, you need to
>>>have completed execution of instructions almost immediately prior.
>>>Consider the example I gave in a previous post:
>>>mov    esi, [ecx+SomeFuncPtr]
>>>call   esi
>>>call   esi
>>>This is an indirect call. The machine loads the function pointer into the esi
>>>register, and then the contents of that register are used as the target of the
>>>unconditional branch. The problem is that the machine doesn't get around to
>>>storing into esi until a cycle or so before the call, and the pipeline is some
>>>20+ cycles deep (depending on processor). After it decodes the call instruction,
>>>it needs to know where to go next, but it has to stall for those 20 cycles until
>>>it knows what gets written to esi.
>>>Static prediction assumes that it's not going to branch (in this case).
>>>Obviously we know that it does, but rather than stall the pipeline for those 20+
>>>cycles until they know WHERE it goes to, they choose simply to continue decode
>>>at the wrong place. (If you make a small function call, this is actually
>>>marginally advantageous over doing nothing since you've already got the return
>>>address in the cache.)
>>I'm not quite following here.  So far as I understand the processor, there is
>>_no_ branch prediction for an unconditional branch, and it can _never_ be
>>wrong.  Forward conditional branches are assumed not taken if they don't appear
>>in the BTB, backward branches are assumed to be taken if they are not in the
>>BTB, otherwise the BTB dictates.  But for _unconditional branches_ I don't
>>believe _any_ of this applies.
>Yes and no. There are two types of unconditional branches: direct and indirect.
>The above rules are true for static prediction (when no BTB entry exists) except
>for indirect branches.
>The Intel Xeon optimization manual specifies that indirect branches are
>predicted as not taken (section 2-15), presumably because it is difficult to
>predict. I have read similar things about the Athlon, that it can't predict
>indirect branches, but I cannot find the reference.

OK.  this sounds like a _bad_ fix.  IE intentionally predicting that the
branch won't be taken, knowing it will.  ANd probably using a bogus branch
address as well since it is unknown at the time.

The problem is that this will certainly interfere with SMT, which is a shame
and a waste, if that is what they are doing.

>The problem here lies in the question, "How many indirect branches have the
>target address over X cycles ahead of time where X is the latency of the
>pipeline?" You have to also remember that function calls complicate things since
>they destroy contents of certain registers. Other "preserved" registers don't
>necessarily remain unchanged -- they simply get their value back at the end of
>the function.
>>>In the case of an if/then/else, some implementations will work on BOTH branches
>>>so you don't lose as much. Either way, the if/then/else can predict right, and
>>>static prediction can NEVER get it right the first time.
>>Actually they can be right _most_ of time time if you write the code correctly.
>>If there is a 50% probability, no.  But if one path is more common, then that
>>path should be predicted almost all the time, which will make it right most of
>>the time also...
>Yes, with code written well. I believe I said the same thing to Vincent
>recently. :-)

This page took 0.01 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.