Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question to Bob: Crafty , Alpha and FindBit()

Author: Vincent Diepeveen

Date: 00:41:43 06/12/98

Go up one level in this thread


On June 11, 1998 at 17:45:05, Eugene Nalimov wrote:

>>>no argument from me there, because your comments don't affect what I
>>>told vincent...  branches will not slow his program 400%, ever.  25%
>>>was the absolute upper limit I would predict assuming his branches are
>>>so evenly taken/not-taken that prediction fails every time.  9 clocks
>>>is less than a cache-miss...
>>
>>more like 5 to 6 times.
>>
>>Same for other programs.
>>
>>I generate knight moves in 1 clock cycle.
>>
>>Branch prediction+partial register stall are 22 clocks if i'm unlucky.
>
>Are you sure you have partial register stalls? P6/PII specially
>recognize sequence of commands

>    XOR reg32, reg32
>    MOV reg8, mem8
>    use reg32
>
>and
>
>    SUB reg32, reg32
>    MOV reg8, mem8
>    use reg32
>
>and there is no partial stall in that case (AP-526, section 3.9,
>"Partial Register Penalties").

Yes, now you give me a compiler that works correct in this respect.

Even better: a compiler which uses the cmove instruction to avoid that
stupid branchprediction in a number of cases.

The main problem is that i'm having a c-program, which is dependant on
how
well the compiler is. To a certain height you can force a certain
compilation,
but not everything.

>Eugene
>
>>
>>First time it gets into that routine it predicts wrong. next 7 times it
>>predicts right.
>>then 8th time it predicts wrong for knight (assuming it has 8 squares).
>>Every generation i have a partial register stall.
>>
>>The same trouble i have in my mobility evaluation, and in all other
>>evaluations.
>>if i get from the l1 cache something out of board[64] then this is
>>damned fast,
>>but the compare doesn't get predicted correctly, as it's a complex
>>evaluation.
>>hard to predict.
>>
>>So branch prediction is the main problem. above that for Pro200 partial
>>register stalls also the problem, but minor compared to branch
>>prediction
>>
>>You count it.
>>
>>I have done lot of effort to get datastructure within L2 cache and
>>now move generation all together fits within 25kilobytes.
>>
>>That's way less than 256kb cache, or the coming caches.
>>No cache mishit, and cache gets after program start filled within a
>>millisecond.
>
>>>1.5% I remember.. but remember, that is nowhere near the 400% faster
>>>figure Vincent mentioned, "if branch prediction was better".  That won't
>>>ever happen...

Now what's more likely: BSF/R to become faster in future or mispredicted
branches becoming faster :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.