Author: Dan Newman
Date: 15:08:39 09/29/99
Go up one level in this thread
On September 29, 1999 at 00:23:43, Robert Hyatt wrote: >On September 28, 1999 at 19:43:04, Dan Newman wrote: > >>On September 28, 1999 at 09:42:52, Robert Hyatt wrote: >> >>>On September 27, 1999 at 05:25:05, Dan Newman wrote: >>> >>>>On September 25, 1999 at 13:56:29, Vincent Diepeveen wrote: >>>> >>>>>We should compare speeds. >>>>> >>>>>What i do after 1.e4,e5 2.d4,d5 is >>>>>generate the move list a few million times (n times). >>>>>Then i divide the time by (n * #semi legal moves). >>>>> >>>>>#nodes a second generated at a PII450 for diep is 15.5M >>>>> >>>>>Haven't seen anyone faster so far, though most are not >>>>>reveiling their speed. >>>>> >>>>>Note that if we have a merced within say 5 years that i >>>>>can implement bitboards for a few eval terms within a day by then. >>>>> >>>>>No need for rotated bb of course as i keep my current generator >>>>>and makemove. >>>>> >>>>>Majority of code is not faster using bitboards even at 64 bits >>>>>machine. >>>>> >>>> >>>> >>>>I tried this position out on the (non-rotated) bitboard engine that I'm just >>>>starting. I compiled with MSVC++5 and get 10.25 M moves generated/s on a >>>>P6/200. So, (ignoring cache speed differences) I should get something near >>>>2.25 * 10.25 == 23.6 Mm/s on the PII450. >>>> >>>>OTOH, my (more developed) 0x88 engine gets 11.3 Mm/s on this position, which >>>>is slightly better than the bitboard program (roughly 25 million moves >>>>generated per second on the PII450). It seems like if I were to go to a 64 >>>>bit machine, the bitboard version would benefit somewhat more and would >>>>probably go faster than the 0x88. >>>> >>>>Of course these measurements, made by repetitively calling the move generator >>>>in a tight loop for the same position, may not tell us much because of (among >>>>other things) the branch prediction hardware. This will artificially inflate >>>>these numbers. In real life when the move generator is called with different >>>>positions and is continually getting blown out of cache, it may perform quite >>>>differently... >>>> >>>>Actually, the speed at which moves are generated is not a good reason to >>>>choose between bitboards and 0x88, or whatever. The last time I profiled >>>>my 0x88 program, the move generation was taking about 20%. The SEE was >>>>taking 25% and the eval 4%. If I ever get a decent amount of eval added >>>>(say 50% of cpu time) this will greatly depress the move generation figure. >>>>So a 10% change in move gen speed might change the overall speed by 1% in >>>>a well developed chess program. >>>> >>>>Now, if my (non-rotated) bitboards can help speed up the SEE or some of >>>>the (as yet unwritten) eval terms, it might be a good deal faster than >>>>the 0x88--but I don't really have a good feel for this yet. >>>> >>>>I do like the way bitboards replace the piece-list structures though. It >>>>was always a drag trying to figure out what to do with captured pieces, >>>>where to put promoted pieces, and so forth. All this goes away with >>>>bitboards. I think this is my favorite aspect of bitboards at this >>>>point. >>>> >>>>-Dan. >>> >>> >>>It sounds like you are going very fast. Are you using incremental update or >>>just direct computation of the attack bitmaps when you need them as I do? >>> >>>Bob >> >>I haven't actually gotten that far yet. Currently I've just got the >>move generator and part of make(). I suspect I'll just calculate >>them as I need them, but I'll have to see. Since I'm not doing the >>rotated bitboards, I only get pseudo attacks for the sliders and must >>scan the board on occasion. The problem with incremental calculations >>is that you incur the cost on every call to make() whether they get >>used on not... >> >>Anyway, my bitboard move generator (on the P6) is almost as fast as >>my 0x88 generator. The last time I tried bitboards they were about >>half the speed of 0x88. (BSF works wonders.) Also, I had been >>compiling with Watcom. When I switched to MSVC5 I got a 17% speedup >>on my 0x88 program. On the bitboard move generator, I got 30%! I >>suspect the MS compiler knows more about the P6. >> >>-Dan. > > >Still sounds very fast. IE you are generating all the moves, which means you >are doing a very good job of handling sliders already... When I first tried >this, I had a loop that went thru the 4 directions one at a time to locate the >'blocking piece'. And required 4 calls to bsf/bsr (depending on the direction) >to find this blocking square so that a mask could be used to cut off attacks >beyond that point... > What I'm doing for the slider moves is scaning the board in each of the 4 directions instead of extracting the to-squares from a bitboard, more or less just like I do for the 0x88 generator. I grab the coordinate of the last possible square from a table. I am also ORing the queen and bishop bitmaps together (and queen and rook) before looping through the pieces so that I don't have to have a separate queen section--I don't know if it helps much, but it makes the code shorter: bitboard_t bishops = plist[BISHOP] | plist[QUEEN]; while( bishops.non_empty() ) { int from = bishops.next_one(); int maxto = sw_diag[from]; for( int to = from-9; to >= maxto; to -= 9 ) { if( board[to] ) break; (movep++)->add_move( from, to); } maxto = se_diag[from]; etc. } >for comparison, try crafty if you have time. set up the same position you are >testing with, and then type "perf". On my PII/300 notebook, I get about 5M >moves generated per second... this is just calling the move generator over >and over for the same position, to make it use a reasonable amount of time... > >here is what you might get from the opening: > >White(1): perf >generated 2000000 moves, time=0.53 seconds >generated 3773585 moves per second >generated/made/unmade 2000000 moves, time=2.02 seconds >generated/made/unmade 990099 moves per second > >the first two lines are just repeated calls to the move generator... In >my case one call to GenerateCaptures() and one call to GenerateNonCaptures(). >repeated 100K times. The second two lines generates the same set of moves, >then makes/unmakes each one once. Then repeat the whole process again.. >100K times... Here's what I got on my P6/200 with Crafty 16.16 on Vincent's position, "rnbqkbnr/ppp2ppp/8/3pp3/3PP3/8/PPP2PPP/RNBQKBNR w KQkq -": White(1): perf generated 3800000 moves, time=1.26 seconds generated 3015872 moves per second generated/made/unmade 3800000 moves, time=4.56 seconds generated/made/unmade 833333 moves per second In the runs I did with my bitboard generator (earlier) I was only looping on the non-capture generator, so for proper comparison I re-did it with both generators being called. I get 7.11 M move/s on the P6/200. Also, if I run only 100 k cycles like you are doing, this drops to 4 M moves/s. I don't quite know what accounts for this, but if you run a benchmark like this longer, the speed seems to go up--dramatically. There seems to be a sort of settling in time. I can't imagine that the branch prediction hardware takes all that long to settle in. I suppose there is some initial overhead loading in stuff from memory that's getting counted... -Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.