Author: Vincent Diepeveen
Date: 00:41:43 06/12/98
Go up one level in this thread
On June 11, 1998 at 17:45:05, Eugene Nalimov wrote: >>>no argument from me there, because your comments don't affect what I >>>told vincent... branches will not slow his program 400%, ever. 25% >>>was the absolute upper limit I would predict assuming his branches are >>>so evenly taken/not-taken that prediction fails every time. 9 clocks >>>is less than a cache-miss... >> >>more like 5 to 6 times. >> >>Same for other programs. >> >>I generate knight moves in 1 clock cycle. >> >>Branch prediction+partial register stall are 22 clocks if i'm unlucky. > >Are you sure you have partial register stalls? P6/PII specially >recognize sequence of commands > XOR reg32, reg32 > MOV reg8, mem8 > use reg32 > >and > > SUB reg32, reg32 > MOV reg8, mem8 > use reg32 > >and there is no partial stall in that case (AP-526, section 3.9, >"Partial Register Penalties"). Yes, now you give me a compiler that works correct in this respect. Even better: a compiler which uses the cmove instruction to avoid that stupid branchprediction in a number of cases. The main problem is that i'm having a c-program, which is dependant on how well the compiler is. To a certain height you can force a certain compilation, but not everything. >Eugene > >> >>First time it gets into that routine it predicts wrong. next 7 times it >>predicts right. >>then 8th time it predicts wrong for knight (assuming it has 8 squares). >>Every generation i have a partial register stall. >> >>The same trouble i have in my mobility evaluation, and in all other >>evaluations. >>if i get from the l1 cache something out of board[64] then this is >>damned fast, >>but the compare doesn't get predicted correctly, as it's a complex >>evaluation. >>hard to predict. >> >>So branch prediction is the main problem. above that for Pro200 partial >>register stalls also the problem, but minor compared to branch >>prediction >> >>You count it. >> >>I have done lot of effort to get datastructure within L2 cache and >>now move generation all together fits within 25kilobytes. >> >>That's way less than 256kb cache, or the coming caches. >>No cache mishit, and cache gets after program start filled within a >>millisecond. > >>>1.5% I remember.. but remember, that is nowhere near the 400% faster >>>figure Vincent mentioned, "if branch prediction was better". That won't >>>ever happen... Now what's more likely: BSF/R to become faster in future or mispredicted branches becoming faster :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.