Author: Robert Hyatt
Date: 18:50:28 08/24/03
Go up one level in this thread
On August 23, 2003 at 03:45:09, Johan de Koning wrote:
>On August 22, 2003 at 10:45:11, Robert Hyatt wrote:
>
>>On August 22, 2003 at 02:53:06, Johan de Koning wrote:
>>
>>>On August 21, 2003 at 11:29:49, Robert Hyatt wrote:
>>>
>>>>On August 21, 2003 at 03:16:35, Johan de Koning wrote:
>
>[snip]
>
>>>>>Hence I dare to ask: 25% of what?
>>>>
>>>>NPS went _up_ by 25%+. So total engine speed.
>>>>
>>>>This was changed in Crafty version 9.16, which dates back many years.
>>>
>>>Whoah! This is *very* hard to believe.
>>>There must have been something severely wrong with 9.15 then (continuing chache
>>>trashing comes to mind, but that's just guessing). More likely, this number does
>>>not come from a clean comparison of copy/make versus make/unmake.
>>
>>The _only_ change made was to replace copy/make with make/unmake. Think about
>>the math.
>
>Thinking about the math is easy. Doing the math in order to get valid results is
>much harder since it requires facts to start with.
>
>> Copy/make copies 256 bytes+. Once _every_ node. On today's
>>hardware, my dual xeon 2.8, I search about 2.4M nodes per second. or about
>>400ns per node. Copying 256 bytes is certainly going to show up on the radar
>>in a significant way, when it gets done once every 400 ns.
>
>To start simple, at 2.4 MN/s the average node takes 833 ns, or 2333 cycles.
>That's a fact. :-)
OK.. Fact #1. Your calculator is _broken_. :)
Enter 1 / 2400000 and hit the = button.
You will get 417 nanoseconds. You are off by a factor of two, somehow.
>
>The next interesting fact is the time it takes to copy from cache to cache.
>Unfortunitaly, I don't know this fact, so doing the math stops here (while
>thinking about it continues :-).
>
>I just conducted a simple experiment on an Athlon Thunderbird 1333 MHz with my
>engine doing about 250 kN/s. Adding an unused copy (440 bytes) to the usual
>copy/make shows up as approx 3% in the sampling profiler (that is the single
>instruction repe movsd). Doing the math revealed that this 3% means about 180
>cycles per copy of 110 ints. Since I've heared that Athlon reads take 3 cycles,
>and I've heared long ago that K6 allowed 2 reads and 2 writes at the same time
>it does make sense.
>
>So I'll venture to say it is *almost* a fact that AMDs do blockmoves at a rate
>of 64 bit in 3 cycles. I'm still pretty factless though about blockmove speed of
>P{I...IV}, not to mention the next generations.
The issue is what did you copy? IE the same X bytes, or something like
this:
struct blk array[64]
and copy from array[i] to array[i+1] where i == ply???
That has two effects. More real memory traffic, and more cache line
displacements.
>
>> Yes, when you do
>>a copy, it will start off cache-to-cache. But once you do that cache-to-cache
>>copy you are committed to eventually doing a cache-to-memory write-back.
>
>This is the final interesting fact.
>But also the most non-fact, since it depends almost everything.
>
>In my little experiment above, the extra copy took only 3%, but the actual run
>time went up 5.5%. This may not mean much because 1 extra line in main() can
>easily change the runtime by 1 or 2% (for reasons I haven't fathomed yet). It
>may also mean that data cache is actually getting trashed, and I'm lucky not to
>use large tables on a regular basis.
>
>... Johan
I'm probably more dependent on cache with large bitmap tables to index all
the time for proper masks... IE move generation, etc...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.