Author: Robert Hyatt
Date: 08:32:10 08/25/03
Go up one level in this thread
On August 25, 2003 at 02:56:40, Johan de Koning wrote: >On August 24, 2003 at 21:50:28, Robert Hyatt wrote: > >>On August 23, 2003 at 03:45:09, Johan de Koning wrote: >> >>>On August 22, 2003 at 10:45:11, Robert Hyatt wrote: >>> >>>>On August 22, 2003 at 02:53:06, Johan de Koning wrote: >>>> >>>>>On August 21, 2003 at 11:29:49, Robert Hyatt wrote: >>>>> >>>>>>On August 21, 2003 at 03:16:35, Johan de Koning wrote: >>> >>>[snip] >>> >>>>>>>Hence I dare to ask: 25% of what? >>>>>> >>>>>>NPS went _up_ by 25%+. So total engine speed. >>>>>> >>>>>>This was changed in Crafty version 9.16, which dates back many years. >>>>> >>>>>Whoah! This is *very* hard to believe. >>>>>There must have been something severely wrong with 9.15 then (continuing chache >>>>>trashing comes to mind, but that's just guessing). More likely, this number does >>>>>not come from a clean comparison of copy/make versus make/unmake. >>>> >>>>The _only_ change made was to replace copy/make with make/unmake. Think about >>>>the math. >>> >>>Thinking about the math is easy. Doing the math in order to get valid results is >>>much harder since it requires facts to start with. >>> >>>> Copy/make copies 256 bytes+. Once _every_ node. On today's >>>>hardware, my dual xeon 2.8, I search about 2.4M nodes per second. or about >>>>400ns per node. Copying 256 bytes is certainly going to show up on the radar >>>>in a significant way, when it gets done once every 400 ns. >>> >>>To start simple, at 2.4 MN/s the average node takes 833 ns, or 2333 cycles. >>>That's a fact. :-) >> >>OK.. Fact #1. Your calculator is _broken_. :) >> >>Enter 1 / 2400000 and hit the = button. >> >>You will get 417 nanoseconds. You are off by a factor of two, somehow. > >Hey, I was only *playing* Vincent when I entered this thread! :-) >Since I'm not really Vincent, there is no need to automatically state the >opposite of what I write. :) > >>>The next interesting fact is the time it takes to copy from cache to cache. >>>Unfortunitaly, I don't know this fact, so doing the math stops here (while >>>thinking about it continues :-). >>> >>>I just conducted a simple experiment on an Athlon Thunderbird 1333 MHz with my >>>engine doing about 250 kN/s. Adding an unused copy (440 bytes) to the usual >>>copy/make shows up as approx 3% in the sampling profiler (that is the single >>>instruction repe movsd). Doing the math revealed that this 3% means about 180 >>>cycles per copy of 110 ints. Since I've heared that Athlon reads take 3 cycles, >>>and I've heared long ago that K6 allowed 2 reads and 2 writes at the same time >>>it does make sense. >>> >>>So I'll venture to say it is *almost* a fact that AMDs do blockmoves at a rate >>>of 64 bit in 3 cycles. I'm still pretty factless though about blockmove speed of >>>P{I...IV}, not to mention the next generations. >> >>The issue is what did you copy? IE the same X bytes, or something like >>this: >> >>struct blk array[64] >> >>and copy from array[i] to array[i+1] where i == ply??? >> >>That has two effects. More real memory traffic, and more cache line >>displacements. > >Since I was only interested in cache to cache speed, I simply inserted >/**/ dummy[d] = pos[d]; >just before the usual copy/make that starts with >/**/ pos[d+1] = pos[d]; > There is a danger in your test. It is _possible_ that a good compiler will recognize that code as "useless" and toss it out silently. I see this happening all the time with gcc and the Intel compiler I use under Linux. If you do something like a=b, and a is not global, and it is not used anywhere further down in the procedure, it will get eliminated. If a is global, it has to do the copy of course as the compiler can't tell who might use that value in another procedure. >Compared to your example of a parallel array (or inflated struct position) >the C2C speed should be the same. Displacements look cheaper in my test, but I >think they're the same on average (not tested, hence possible oversights on my >end). Actually I think displacments are always cheap in a sensible chess program >on 21th century machinery because most of the data is never dirty. TransWhatever >tables are the obvious counter example, but they are (should be) typically >restricted to small data and once per node. > >>>> Yes, when you do >>>>a copy, it will start off cache-to-cache. But once you do that cache-to-cache >>>>copy you are committed to eventually doing a cache-to-memory write-back. >>> >>>This is the final interesting fact. >>>But also the most non-fact, since it depends almost everything. >>> >>>In my little experiment above, the extra copy took only 3%, but the actual run >>>time went up 5.5%. This may not mean much because 1 extra line in main() can >>>easily change the runtime by 1 or 2% (for reasons I haven't fathomed yet). It >>>may also mean that data cache is actually getting trashed, and I'm lucky not to >>>use large tables on a regular basis. >>> >>>... Johan >> >>I'm probably more dependent on cache with large bitmap tables to index all >>the time for proper masks... IE move generation, etc... > >I think you are, that's why I called it a non-fact. >But maybe you should get rid of all those tables. :-) > >... Johan Maybe you should add 'em. :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.