Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: copy cost

Author: Johan de Koning

Date: 23:56:40 08/24/03

Go up one level in this thread


On August 24, 2003 at 21:50:28, Robert Hyatt wrote:

>On August 23, 2003 at 03:45:09, Johan de Koning wrote:
>
>>On August 22, 2003 at 10:45:11, Robert Hyatt wrote:
>>
>>>On August 22, 2003 at 02:53:06, Johan de Koning wrote:
>>>
>>>>On August 21, 2003 at 11:29:49, Robert Hyatt wrote:
>>>>
>>>>>On August 21, 2003 at 03:16:35, Johan de Koning wrote:
>>
>>[snip]
>>
>>>>>>Hence I dare to ask: 25% of what?
>>>>>
>>>>>NPS went _up_ by 25%+.  So total engine speed.
>>>>>
>>>>>This was changed in Crafty version 9.16, which dates back many years.
>>>>
>>>>Whoah! This is *very* hard to believe.
>>>>There must have been something severely wrong with 9.15 then (continuing chache
>>>>trashing comes to mind, but that's just guessing). More likely, this number does
>>>>not come from a clean comparison of copy/make versus make/unmake.
>>>
>>>The _only_ change made was to replace copy/make with make/unmake.  Think about
>>>the math.
>>
>>Thinking about the math is easy. Doing the math in order to get valid results is
>>much harder since it requires facts to start with.
>>
>>>  Copy/make copies 256 bytes+.  Once _every_ node.  On today's
>>>hardware, my dual xeon 2.8, I search about 2.4M nodes per second.  or about
>>>400ns per node.  Copying 256 bytes is certainly going to show up on the radar
>>>in a significant way, when it gets done once every 400 ns.
>>
>>To start simple, at 2.4 MN/s the average node takes 833 ns, or 2333 cycles.
>>That's a fact. :-)
>
>OK..  Fact #1.  Your calculator is _broken_.  :)
>
>Enter 1 / 2400000 and hit the = button.
>
>You will get 417 nanoseconds.  You are off by a factor of two, somehow.

Hey, I was only *playing* Vincent when I entered this thread! :-)
Since I'm not really Vincent, there is no need to automatically state the
opposite of what I write.

>>The next interesting fact is the time it takes to copy from cache to cache.
>>Unfortunitaly, I don't know this fact, so doing the math stops here (while
>>thinking about it continues :-).
>>
>>I just conducted a simple experiment on an Athlon Thunderbird 1333 MHz with my
>>engine doing about 250 kN/s. Adding an unused copy (440 bytes) to the usual
>>copy/make shows up as approx 3% in the sampling profiler (that is the single
>>instruction repe movsd). Doing the math revealed that this 3% means about 180
>>cycles per copy of 110 ints. Since I've heared that Athlon reads take 3 cycles,
>>and I've heared long ago that K6 allowed 2 reads and 2 writes at the same time
>>it does make sense.
>>
>>So I'll venture to say it is *almost* a fact that AMDs do blockmoves at a rate
>>of 64 bit in 3 cycles. I'm still pretty factless though about blockmove speed of
>>P{I...IV}, not to mention the next generations.
>
>The issue is what did you copy?  IE the same X bytes, or something like
>this:
>
>struct blk array[64]
>
>and copy from array[i] to array[i+1] where i == ply???
>
>That has two effects.  More real memory traffic, and more cache line
>displacements.

Since I was only interested in cache to cache speed, I simply inserted
/**/ dummy[d] = pos[d];
just before the usual copy/make that starts with
/**/ pos[d+1] = pos[d];

Compared to your example of a parallel array (or inflated struct position)
the C2C speed should be the same. Displacements look cheaper in my test, but I
think they're the same on average (not tested, hence possible oversights on my
end). Actually I think displacments are always cheap in a sensible chess program
on 21th century machinery because most of the data is never dirty. TransWhatever
tables are the obvious counter example, but they are (should be) typically
restricted to small data and once per node.

>>>  Yes, when you do
>>>a copy, it will start off cache-to-cache.  But once you do that cache-to-cache
>>>copy you are committed to eventually doing a cache-to-memory write-back.
>>
>>This is the final interesting fact.
>>But also the most non-fact, since it depends almost everything.
>>
>>In my little experiment above, the extra copy took only 3%, but the actual run
>>time went up 5.5%. This may not mean much because 1 extra line in main() can
>>easily change the runtime by 1 or 2% (for reasons I haven't fathomed yet). It
>>may also mean that data cache is actually getting trashed, and I'm lucky not to
>>use large tables on a regular basis.
>>
>>... Johan
>
>I'm probably more dependent on cache with large bitmap tables to index all
>the time for proper masks...  IE move generation, etc...

I think you are, that's why I called it a non-fact.
But maybe you should get rid of all those tables. :-)

... Johan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.