Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Robert Hyatt

Date: 14:35:09 08/25/03

Go up one level in this thread


On August 25, 2003 at 17:22:52, Sune Fischer wrote:

>On August 25, 2003 at 16:50:08, Dan Andersson wrote:
>
>> The issue is the same. Because you can't guarantee that copying will be in
>>cache. And you can't guarantee that other data structures won't be close or
>>aligned in such a way that it won't trash the cache. The impact might not be
>>great but it will be there. So the net cache bandwidth will be lower or even
>>much lower than the simple linear relationship. Thus the slow main memory
>>bottleneck will appear.
>
>For the whole picture goes, probably yes, but it is not easy to figure that
>since there are many factors.
>
>I know this is down to hair splitting now, but IMO the reason that unmaking is
>faster than uncopying isn't the one Bob gave, and I quote:
>
>"
>>>I was thinking more about how silly it is to copy the empty bitboards for each
>>>ply. If you update the boards that are active, they will stay in the cache.
>>>Those that are not used might drop out, unless they are copied once every micro
>>>second.
>>>
>>
>>That is a reasonable rate for a program that searches 1M nodes per second.  I'm
>>going at 2.4M so make that about once every 400 nanoseconds.  :)  Suddenly it
>>begins to add up in a big way.  :)"
>
>As though the 2.4 Mnps was the reason.

No.  the 2.4M nps simply gives a frequency, roughly 400ns.  Which is _my_
programs frequency on my dual 2.8ghz box.  That gives me a _specific_ time
per node, and it is pretty easy to estimate that copy/make is going to be
a significant part of that...

I was not saying that 2.4M nodes per second is the reason it fails for me,
particularly.  I simply said that I search a node per 400+ ns, which means
I have to do a copy/make every 400+ ns.  That's a lot of bandwidth.  That the
PC doesn't really have.

The dual actually makes this worse than a single cpu, as I said, due to two
caches, snooping writes, and invalidating things in their own cache that the
other processor just modified in the other cache.


>The reason is that double stacks increase memory trafic _between CPUs_, but that
>is _not_ what he said if you follow the thread, and this thread wasn't about SMP
>at all, so if that was his point I'm not sure how it related to the discussion.

It was about crafty and copy/make.  As I said, if I run at 2.4M nodes per
second, I have to do a copy/make every 400 ns.  Whether I have one processor
or 1024 processors, that won't change.  And, in fact, on the dual it is harder
to do that than on a single because of snooping.



>
>He gave numbers of 25%, nobody can confirm those numbers (I get ~10%), but I
>figure now that he was talking 25% in SMP search, or what?

No.  Crafty version 9 was not SMP.  The first SMP version was 15.0.  The
copy/make was dropped in version 9, and it produced a 25% speedup, no more,
no less.  For Crafty specifically.  That's all I can say with any certainty.
And I don't claim it is 25% for _other_ programs.  Only that it was 25% for
Crafty, and that was the _only_ change to the program.  Going from Copy/Make to
Make/Unmake.


>
>Suddenly the whole thing is rather confusing because his numbers doesn't compare
>with non-smp numbers, and I believe Johan was talking strictly non-smp.

I don't think there are _any_ differences between SMP and non-SMP in this
regard, other than possibly SMP is worse, rather than being better as you
suggested (dual caches, etc).


>
>But anyway, this is getting silly. ;-)
>
>-S.
>>MvH Dan Andersson



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.