Author: Bo Persson
Date: 04:38:01 08/22/03
Go up one level in this thread
On August 22, 2003 at 02:53:06, Johan de Koning wrote: >On August 21, 2003 at 11:29:49, Robert Hyatt wrote: > >>On August 21, 2003 at 03:16:35, Johan de Koning wrote: >> >>>On August 20, 2003 at 14:27:57, Robert Hyatt wrote: >>> >>>>On August 20, 2003 at 03:59:38, Johan de Koning wrote: >>>> >>>>>On August 19, 2003 at 22:11:14, Robert Hyatt wrote: >>>>> >>>>>>On August 19, 2003 at 20:06:58, Mathieu Pagé wrote: >>>>>> >>>>>>>Hi, >>>>>>> >>>>>>>The fact: >>>>>>> >>>>>>>I have this question i read at some place that it is faster to unmake a move >>>>>>>than to save the state of the game before moving then restoring it when we want >>>>>>>to unmake the move. >>>>>>> >>>>>>>For the moment my engines did not implement unmake() (it is still buggy). >>>>>>> >>>>>>>My thougth: >>>>>>> >>>>>>>Since bitboard computation are slow (on 32 hardware) i think that it can be >>>>>>>slower to unmake the move than to save the state. I friend of me that is lot >>>>>>>better than me at optimizing code also think that. >>>>>>> >>>>>>>My questions: >>>>>>> >>>>>>>Are you all using unmake() function or there is some of you that found that >>>>>>>saving the state is better ? >>>>>> >>>>>> >>>>>> >>>>>>read the comments from Crafty in main.c. I started out using what is >>>>>>commonly called "copy/make" as that worked well in Cray Blitz. But it >>>>>>didn't work well in the PC. The PC has very limited memory bandwidth, >>>>>>when you compare the speed of memory to the speed/demands of current >>>>>>processors. If you keep the board in cache, and update it there, it is >>>>>>more efficient than to copy it from real memory to real memory... >>>>> >>>>>I hate to play Vincent here, but real memory is not an issue. >>>>> >>>>>If you manage to keep the deepest few plies worth of position structs in L1 >>>>>cache, then bandwith is pretty decent on the PC. And it has been ever since them >>>>>PCs were endowed with cache. >>>> >>>>Sure, but look at what happens. You copy a couple of hundred bytes. You >>>>update it _once_. Then you copy it again for the next ply. And so on. Not >>>>only are you not re-using what you moved around early, you are displacing good >>>>stuff from the cache as well. >>> >>>You *are* re-using the stuff that you didn't change, by skipping the unmake() >>>while backing up. And yes, you are claiming more cache space. But only the very >>>few most active copies are relevant. >> >>Not quite. I regularly hit 50+ plies deep. By the time I back up to ply >>20, that is long-gone from cache. And it gets re-loaded. > >And this happens quite often. >Particularly if you have a branching factor of 1.01 or something. :-) > >> As I said, I'm >>not guessing here. I originally did it via copy/make and looked at the >>performance after changing to make/unmake. For the data structure size I am >>using in Crafty, Copy/Make hurt performance. Note that I do a small bit of >>copy/make now, but not for the big bitboard structures, just for hash signatures >>and the like. >> >> >>> >>>>I'm not really guessing here. I did it both ways. My bitmap stuff was, at the >>>>time, something like 168 bytes. When I got rid of copy/make and went to >>>>make/unmake, I gained over 25% in raw speed, because _all_ of the bitmaps sit >>>>in cache and stay there. >>> >>>It does of course depend on the amount and nature of the changes, as well as on >>>the copy size and cache size. More importantly, would it influence performance >>>at all? For a "slow" engine like mine 168 bytes is peanuts, since would be >>>copied in (eg Athlon Thunderbird) 168/4*3 cycles. >>> >>>Hence I dare to ask: 25% of what? >> >>NPS went _up_ by 25%+. So total engine speed. >> >>This was changed in Crafty version 9.16, which dates back many years. > >Whoah! This is *very* hard to believe. I have a similar experience trying to get rid of the unmake() in my (private) engine by copying the bitmaps. It turned out to be quite a bit slower, maybe not 25% but in the 10-20% range. This was for a slightly older architecture with a 10x CPU/memory clock ratio, but I doubt that the P4 could turn this the other way round. >There must have been something severely wrong with 9.15 then (continuing chache >trashing comes to mind, but that's just guessing). More likely, this number does >not come from a clean comparison of copy/make versus make/unmake. It's not unreasonable, if we try to estimate the amount of memory traffic the copies generate. If we use Bob's figures of about 256 bytes per position and 2.4M nps, that would be about 0.6 GB/s just for this tructure. That could easily be 20-25% of the usable memory bandwidth! Bo Persson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.