Author: Robert Hyatt
Date: 13:52:42 09/03/03
Go up one level in this thread
On September 03, 2003 at 16:37:03, Gerd Isenberg wrote: >On September 03, 2003 at 14:53:23, Robert Hyatt wrote: > >>I finally found time to run a few tests. >> >>First, the set-up: I took all my bitmap stuff, added it up, and put it in >>a structure that was then set up as an array. IE tdata[128] was an array >>of these structures, each element being just enough bytes to hold all of what >>I call a "position" including bitmap boards, hash signature, 50 move counter, >>etc...) >> >>At the top of MakeMove() I added tdata[ply+1]=tdata[ply]; to do the copy >>stuff. That's all. >> >>Some results: >> >>fine 70 searched to 39 plies deep: >> >>copy/make : 15.5 seconds, make/unmake 14.0 seconds (11% overhead) >> >>kopec 22 searched to 12 plies deep: >> >>copy/make : 32.4 seconds, make/unmake 28.3 seconds. (14.5% overhead) >> >>mate position searched 9 plies deep (mate in 10) >> >>copy/make : 8.9 seconds, make/unmake 7.5 seconds. (18.7% overhead) >> >>That's not all the story. however. The copy/make approach requires an >>extra register everywhere since the data structures have to be accessed >>through a pointer (or via an array subscript, same thing). My test case >>does not take care of that. But if you were to mark one register as >>"unusable" for the compiler, the result would be worse, for certain. Since >>these data values are accessed all over the place, a register has to be used >>everywhere, which is going to add to the above, significantly. If it only adds >>10% then the above numbers are back to what I originally saw when I speeded >>things up by 25% by getting rid of copy/make. >> >>That's data for Crafty. YMMV of course... > >Interesting results. Not sure about the additional register, depends on the >implementation. I assume that _any_ pointer needs a register. Where the global board stuff does not. IE I know how to get a global w_pawns into a register, but if I have w_pawns[i] I have to deal with i, which eats a register. That was my point. I am copying data in this test, but I am not using it anywhere so there is no register loss. If this were my old copy/make program, after copying all that stuff, I would continually reference it with a subscript or via pointer to get to the right instance of it. > If you address an array with n and n-1, one register for n >should be enough. I guess it's really the latency of the additional read cycles >- plus the penalty for additional cache pollution. I think cache is a real issue. > >How many bytes do you copy? The actual number is, I believe, 232 bytes. But that turns into 256 obviously, as it takes two cache lines. >Are source and target adjacent and properly aligned? Yes. I rounded the struct to 256 bytes, so that each element is exactly the right size and properly aligned. >Have you ever tried MMX-copy instead of memcpy? Nope. Any performance numbers to compare them??? > >movq mm0, [source] >movq mm1, [source+8] >... >movq [target ], mm0 >movq [target+8], mm1 >... > >Another improvement in copymake is to combine the seperate copy make >read-writes.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.