Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: copy/make vs make/unmake some data

Author: Gerd Isenberg

Date: 00:55:05 09/04/03

Go up one level in this thread


On September 03, 2003 at 16:52:42, Robert Hyatt wrote:

>On September 03, 2003 at 16:37:03, Gerd Isenberg wrote:
>
>>On September 03, 2003 at 14:53:23, Robert Hyatt wrote:
>>
>>>I finally found time to run a few tests.
>>>
>>>First, the set-up:  I took all my bitmap stuff, added it up, and put it in
>>>a structure that was then set up as an array.  IE tdata[128] was an array
>>>of these structures, each element being just enough bytes to hold all of what
>>>I call a "position" including bitmap boards, hash signature, 50 move counter,
>>>etc...)
>>>
>>>At the top of MakeMove() I added tdata[ply+1]=tdata[ply]; to do the copy
>>>stuff.  That's all.
>>>
>>>Some results:
>>>
>>>fine 70 searched to 39 plies deep:
>>>
>>>copy/make : 15.5 seconds, make/unmake 14.0 seconds  (11% overhead)
>>>
>>>kopec 22 searched to 12 plies deep:
>>>
>>>copy/make : 32.4 seconds, make/unmake 28.3 seconds. (14.5% overhead)
>>>
>>>mate position searched 9 plies deep (mate in 10)
>>>
>>>copy/make : 8.9 seconds, make/unmake 7.5 seconds.  (18.7% overhead)
>>>
>>>That's not all the story. however.  The copy/make approach requires an
>>>extra register everywhere since the data structures have to be accessed
>>>through a pointer (or via an array subscript, same thing).  My test case
>>>does not take care of that.  But if you were to mark one register as
>>>"unusable" for the compiler, the result would be worse, for certain.  Since
>>>these data values are accessed all over the place, a register has to be used
>>>everywhere, which is going to add to the above, significantly.  If it only adds
>>>10% then the above numbers are back to what I originally saw when I speeded
>>>things up by 25% by getting rid of copy/make.
>>>
>>>That's data for Crafty.  YMMV of course...
>>
>>Interesting results. Not sure about the additional register, depends on the
>>implementation.
>
>I assume that _any_ pointer needs a register.  Where the global board stuff
>does not.  IE I know how to get a global w_pawns into a register, but if I
>have w_pawns[i] I have to deal with i, which eats a register.  That was my
>point.  I am copying data in this test, but I am not using it anywhere so there
>is no register loss.  If this were my old copy/make program, after copying all
>that stuff, I would continually reference it with a subscript or via pointer to
>get to the right instance of it.
>
>
>
>> If you address an array with n and n-1, one register for n
>>should be enough. I guess it's really the latency of the additional read cycles
>>- plus the penalty for additional cache pollution.
>
>I think cache is a real issue.
>
>
>>
>>How many bytes do you copy?
>
>
>
>The actual number is, I believe, 232 bytes.  But that turns into 256 obviously,
>as it takes two cache lines.
>
>>Are source and target adjacent and properly aligned?
>
>Yes.  I rounded the struct to 256 bytes, so that each element is exactly the
>right size and properly aligned.
>
>>Have you ever tried MMX-copy instead of memcpy?
>
>Nope.  Any performance numbers to compare them???

Explicit (unrolled) and grouped copy loops are considered a bit faster than
vector path rep movsd on Athlon-32. See AMD Athlon™ Processor x86 Code
Optimization Guide page 66.

That few percent is propably not enough to favor copymake.

But what if incremental make update referes to just written data, still in
store-buffer but not in cache. There are some pitfalls with Store-to-Load
Forwarding Restrictions (page 86). I'm not sure whether it is an issue here, but
i guess it is favorable to combine the copymake read/writes, by incremental
update of registers before writing.

At least copymake avoids one final not necessary unmake.
Anyway, i will try make/unmake by myself in IsiChess soon.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.