Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: C and C++ --- NPS

Author: Matt Taylor

Date: 20:00:27 12/26/02

Go up one level in this thread


On December 26, 2002 at 00:52:11, Robert Hyatt wrote:

>On December 25, 2002 at 14:13:47, Matt Taylor wrote:
>
>>On December 25, 2002 at 04:17:34, Dave Gomboc wrote:
>>
>>>On December 23, 2002 at 21:33:50, Matt Taylor wrote:
>>>
>>>>On December 23, 2002 at 21:29:08, Arshad F. Syed wrote:
>>>>
>>>>>Is there anyone here who has actually converted their C code to C++ while
>>>>>keeping the NPS the same or even managed to improve upon it? Sune did mention
>>>>>that he did so only at the cost of 15% degradation in speed. Why do people even
>>>>>bother wasting their time with C++ for a chess program when it inevitably leads
>>>>>to taking a hit in NPS? Regardless, of all the advice to simply inline the
>>>>>functions, I have yet to see a living example of chess program code which
>>>>>performed faster in C++.
>>>>>
>>>>>Regards,
>>>>>Arshad
>>>>
>>>>The biggest hit you take in C++ is virtual functions, and inevitably people use
>>>>them. The virtual functions are very damaging due to the fact that they're
>>>>implemented with a function pointer, and at the machine level it translates into
>>>>an indirect call. The indirect calls branch mispredict because the CPU can't
>>>>figure out ahead-of-time where you're calling to, and branch mispredicts are
>>>>expensive.
>>>
>>>Please excuse my ignorance, but where is the branch?  It's an indirect jump,
>>>isn't it?  There's nothing optional about it, it must be followed.  Is it that I
>>>am not familiar enough with the terminology?  e.g. I am thinking that by "branch
>>>prediction" you may be including any operation that moves the program counter in
>>>any way other than advancing sequentially to the next instruction.
>>>
>>>In any case, I'm confident that if you have a serious inheritance hierarchy (as
>>>opposed to a toy example) you're not going to lose any sleep over the virtual
>>>function call.  Many if-then-elses aren't efficient either.
>>>
>>>Dave
>>
>>A branch is anything that changes the flow of execution from the pc++ scheme. An
>>unconditional jump/call is still a branch.
>>
>>The problem that superscalar processors have is that they're executing multiple
>>things simultaneously, and often to predict the target of a branch, you need to
>>have completed execution of instructions almost immediately prior.
>>
>>Consider the example I gave in a previous post:
>>mov    esi, [ecx+SomeFuncPtr]
>>call   esi
>>call   esi
>>
>>This is an indirect call. The machine loads the function pointer into the esi
>>register, and then the contents of that register are used as the target of the
>>unconditional branch. The problem is that the machine doesn't get around to
>>storing into esi until a cycle or so before the call, and the pipeline is some
>>20+ cycles deep (depending on processor). After it decodes the call instruction,
>>it needs to know where to go next, but it has to stall for those 20 cycles until
>>it knows what gets written to esi.
>>
>>Static prediction assumes that it's not going to branch (in this case).
>>Obviously we know that it does, but rather than stall the pipeline for those 20+
>>cycles until they know WHERE it goes to, they choose simply to continue decode
>>at the wrong place. (If you make a small function call, this is actually
>>marginally advantageous over doing nothing since you've already got the return
>>address in the cache.)
>>
>
>I'm not quite following here.  So far as I understand the processor, there is
>_no_ branch prediction for an unconditional branch, and it can _never_ be
>wrong.  Forward conditional branches are assumed not taken if they don't appear
>in the BTB, backward branches are assumed to be taken if they are not in the
>BTB, otherwise the BTB dictates.  But for _unconditional branches_ I don't
>believe _any_ of this applies.

Yes and no. There are two types of unconditional branches: direct and indirect.
The above rules are true for static prediction (when no BTB entry exists) except
for indirect branches.

The Intel Xeon optimization manual specifies that indirect branches are
predicted as not taken (section 2-15), presumably because it is difficult to
predict. I have read similar things about the Athlon, that it can't predict
indirect branches, but I cannot find the reference.

The problem here lies in the question, "How many indirect branches have the
target address over X cycles ahead of time where X is the latency of the
pipeline?" You have to also remember that function calls complicate things since
they destroy contents of certain registers. Other "preserved" registers don't
necessarily remain unchanged -- they simply get their value back at the end of
the function.

>>In the case of an if/then/else, some implementations will work on BOTH branches
>>so you don't lose as much. Either way, the if/then/else can predict right, and
>>static prediction can NEVER get it right the first time.
>>
>>-Matt
>
>Actually they can be right _most_ of time time if you write the code correctly.
>If there is a 50% probability, no.  But if one path is more common, then that
>path should be predicted almost all the time, which will make it right most of
>the time also...

Yes, with code written well. I believe I said the same thing to Vincent
recently. :-)

-Matt



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.