Author: Robert Hyatt
Date: 14:59:13 06/20/02
Go up one level in this thread
On June 20, 2002 at 16:50:46, Eugene Nalimov wrote: >Ok, here is why I don't expect the speedup will be more than ~10%. I already >gave that example here a year or two ago, but I am posting it again. > >Let's assume that we have a superscalar CPU with (moderate) pipeline length of >12. Each cycle it can issue 2 instructions. Each memory reference takes 2 cycles >(data is mainly in the L1 cache), register operation takes 1 cycle, >hard-to-predict branch takes 6 cycles (in half of the cases we get it right, so >there is no penalty, in other half there is 12 cycles penalty for the >mispredicten branch). Model is oversimplifcated, but it will give you an idea. > >In the source we have something like > if (p->pawns & some_bit_mask) > ... >On the 64-bit CPU we'll have > ld R1=[R2+offsetof(pawns)] > and R3=R1,some_bit_mask > cmp R3,0 > bne ... >Total time is 10 cycles. Right.. But shouldn't we depend on the compiler to "lift" another similar set of instructions from below, and interlace them so that they schedule in pairs or triples as well? IE the Cray compiler certainly does this well... > >On the 32-bit CPU we'll have > ld R1=[R2+offsetof(pawns)] > ld R3=[R2+offsetof(pawns)+4] > and R4=R1,low_4_bytes(some_bit_mask) > and R5=R3,high_4_bytes(some_bit_mask) > or R6=R4,R5 > cmp R6,0 > bne ... >Total time is 11 cycles. > >And there is a surprisingly lot of places in the code where even >bitboard-oriented program works with "native length" data -- all operations with >alpha, beta, score, loop counters, etc. > >Eugene I don't disagree there to a point. But in places like movegen, evaluate, make/unmake, the majority of the things being fiddled with are 64 bit values. That is why I have never tried to imply that a 64 bit machine would be twice as fast, period. But it should be twice as fast on the parts of the engine that are really beating on 64 bit values. Such as the above ones... Even the cray had both 32 and 64 bit registers so that you could be doing 32 bit stuff (mainly address/index calculations) while the 64 bit registers were doing 64 bit stuff. > >On June 20, 2002 at 16:02:04, Robert Hyatt wrote: > >>On June 20, 2002 at 15:58:29, Tom Kerrigan wrote: >> >>>On June 20, 2002 at 15:25:36, Sune Fischer wrote: >>> >>>>So what you are saying is that you can't just count the number of operation and >>>>use that to pridict the speed? >>> >>>Counting the operations is difficult. You can't just go through the source code >>>and count them because that doesn't tell you how often the code is run. >>> >>>>Now if Crafty is 50% 64 bit operations then we can expect a factor of two in >>>>speedup on that 50%, right? >>> >>>I think Crafty must be << 50% 64 bit operations. Think about all the data that >>>Crafty must operate on that isn't bitboards... >>> >>>-Tom >> >> >>Which data is that? The top 80% in profiling depends on bitboards heavily. >>generating moves, evaluating positions, updating the bitmaps in make/unmake, >>detecting checks, evaluating Swap(). >> >>I doubt it is << 50% (where << typically means "much less than").
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.