Author: Dan Newman
Date: 18:15:33 01/29/01
Go up one level in this thread
On January 29, 2001 at 14:37:45, Dann Corbit wrote:
>On January 29, 2001 at 09:08:59, José Carlos wrote:
>
>> As I started to rewrite my book management code, I decide to rewrite the whole
>>engine in order to get more speed and being able to include new knowledge
>>without getting too slow.
>> With that idea, I tried changing my board[64] for a 0x88 move generator. Right
>>now, I've only written the move generator but, when I tested it to measure if
>>any improvement in speed, I got amazed. In this position:
>>
>>[D]rnbqkbnr/ppp2ppp/8/3pp3/3PP3/8/PPP2PPP/RNBQKBNR w KQkq d6 0 3
>>
>> that, if I'm not wrong, is call "Vincent's test" (because Vincent Diepeveen
>>created it), where you have to generate all moves 2,000,000 times, I was getting
>>about 9,500,000 moves per second (8.5 seconds for the whole test) in my AMD
>>Athlon 550, with my old board[64].
>> Now, with 0x88 I'm getting about 14,500,000 mps (5.5 seconds), which is a huge
>>improvement (I don't remember the exact numbers; it happend last night and I'm
>>at work right now).
>> My questions:
>> - is my new number (14,500,000 moves per second) really fast for my hardware
>>or I was really slow with my previous board[64]?
>> - does it make sense such improvement in speed for that change?
>> - could you please post your results in this test (and your hardware) for
>>your programs, so that I can compare.
>>
>> Tonight I'll continue with my make/unmake functions, which were my bottleneck
>>in Averno. I realized that the inCheck detection determined the speed of all the
>>program. Is there any "smart" trick for make fast in check detection with 0x88?
>
>For something more interesting, try both Vincent's position and also this one:
>8/8/p1r5/6k1/KP6/8/8/5R2 b - -
>
>with both board formats.
Here's what Shrike (a non-rotated bitboard program) gets:
On a P3 (non-coppermine) at 600 MHz:
position captures non-captures both together
-----------------------------------------------------
Vincent's 3.6 35.1 24.5
Dann's 0.0 40.7 30.2
On a P3 (coppermine) at 980 MHz:
position captures non-captures both together
-----------------------------------------------------
Vincent's 6.1 62.3 43.6
Dann's 0.0 72.6 53.7
All the figures are million moves generated per second.
I've shown captures and non-captures separately since I
generate them with separate routines. The "both together"
column is generated by running the two move generators
alternately.
The one that's really important is the capture generation
rate of course. In my case it gets called 10x as often as
the non-capture generator. Both together take only about
10% of the cpu time (wild guess at this point, but they
were maybe 15% some months back when I first wrote them).
But even if the capture generator took twice as long, it would
only cost 10% or so... These rates really aren't all that
critical as long as you're in the ballpark, but I guess I'd
jump through a lot of hoops to shave off the extra 10% :).
I guess that one should choose basic data structures to speed
up capture generation, even at the expense of non-capture
generation speed. In my case the SEE was one of the biggest
consumers of CPU time. I think bitboards helped a lot there.
And in eval they help a lot too.
You might be able to squeeze a bit more move generation speed
out of 0x88 than bitboards (though I'm not convinced), but
things like the SEE and eval and so forth may end up being
larger consumers of cpu cycles. If you concentrate on speeding
up the move generator it could end up being at the expense of
those other elements... (I speak from experience here, but I
always obsessively tweak my move generators anyway. :)
-Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.