Author: Robert Hyatt
Date: 07:36:18 09/04/03
Go up one level in this thread
On September 04, 2003 at 03:48:31, Johan de Koning wrote: >On September 03, 2003 at 14:53:23, Robert Hyatt wrote: > >>I finally found time to run a few tests. >> >>First, the set-up: I took all my bitmap stuff, added it up, and put it in >>a structure that was then set up as an array. IE tdata[128] was an array >>of these structures, each element being just enough bytes to hold all of what >>I call a "position" including bitmap boards, hash signature, 50 move counter, >>etc...) >> >>At the top of MakeMove() I added tdata[ply+1]=tdata[ply]; to do the copy >>stuff. That's all. >> >>Some results: >> >>fine 70 searched to 39 plies deep: >> >>copy/make : 15.5 seconds, make/unmake 14.0 seconds (11% overhead) >> >>kopec 22 searched to 12 plies deep: >> >>copy/make : 32.4 seconds, make/unmake 28.3 seconds. (14.5% overhead) >> >>mate position searched 9 plies deep (mate in 10) >> >>copy/make : 8.9 seconds, make/unmake 7.5 seconds. (18.7% overhead) > >Thanks for finding the time. > >I do however have some suggestions to create even more valuable data. >(Would you have expected anything else? :-) > >(a) Name hardware! > Preferrably also name system, software, compiler. Sorry. I realized that was omitted last night. Hardware: Xeon 2.8ghz, 1 gig RAM, running linux (redhat 9). Compiler was the latest Intel C compiler version 7.something. This is a Dell PowerEdge 2600 dual xeon box, although I only used one proceessor for the test to avoid SMP variability. > >(b) List the exact positions. > Include some normalish positions. > List N/s. > To avoid the suspicion that fast eval and lazy eval cause these particular > positions to have a particularly high N/s (hence high copy cost). I gave two of the positions, fine 70 and kopec22. The third was a mate position from ACC4. Here are the FENs for all three: fine70: /k/3p/p2P1p/P2P1P///K/ w kopec22: 2r2rk1/1bqnbpp1/1p1ppn1p/pP6/N1P1P3/P2B1N1P/1B2QPP1/R2R2K1 b ACC4: rnbq1b1r/p1pp1p1p/4k3/1p1NP1p1/2QP1p2/5N2/PP1B1KPP/n6R w > >(c) Make the copying dependent on a global flag. > Eg: int DoCopyIf12 = 13, to make sure the memory layout is exactly the same > for both runs. I did that already. That's how the tests were run. > >(d) Run several times. > Just to get an impression of the variance in the timing. > Did that too as it is something my normal "test" script does due to the oddball variability of SMP stuff. The NPS varies _very_ little. Most of the runs show no variability at all, every now and then an unfavorable layout in memory can cause some minor cache issues to show up, but way under 1% variance. >(e) Check the code (or list it). > To make sure the compiler doesn't do anything stupid (it happens). I always do that. However, I was sure there would be no tricks here as the array of structs is global and I compile each module of the program separately so that no global optimization can find the copy was not needed. I can cut/ paste the front of MakeMove() to show that the copy was actually done _every_ time MakeMove() was called. > >>That's not all the story. however. The copy/make approach requires an >>extra register everywhere since the data structures have to be accessed >>through a pointer (or via an array subscript, same thing). My test case >>does not take care of that. But if you were to mark one register as >>"unusable" for the compiler, the result would be worse, for certain. Since >>these data values are accessed all over the place, a register has to be used >>everywhere, which is going to add to the above, significantly. > >Indeed, "all the story" is always YMMV. >On a RISCish machine registers are cheap while absolute adressing is expensive. >Hence addressing data through struct pointers is usually faster. On the 386 >registers are bloody frikking expensive. But pointers (in general) save you a >lot of code space. :-) Actually, on RISC-type machines direct addressing is not expensive. It is simply not possible. IE the SPARC. You have to first put the address in a register, then do a load using that register (plus others if needed for subscripting, etc.) I'm always concerned on the X86 since registers are almost missing in action. > >> If it only adds >>10% then the above numbers are back to what I originally saw when I speeded >>things up by 25% by getting rid of copy/make. > >A fictional 10% won't suffice to even the balance, since you added prepare_undo >to make() and unmake() to search(). A fictional 20% would probably do the job, >but I won't buy that. I'm not sure I follow. The copy added about 15% overhead. If I take out the cost of unmaking, then that overhead is going up. At present, with make and unmake, the copy _still_ added 15% more, without even considering the issue of losing one of four registers for the array accesses all over the place. Again, all I can say about this is that for Crafty, on the P6/200, removing the copy and adding Unmake() speeded me up 25%. This will almost certainly vary program by program, and probably even with different hardware. But it seems pretty consistent from the P6 to the PIV, based on the simple copy test I ran so far... > >... Johan > >>That's data for Crafty. YMMV of course...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.