Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Sune Fischer

Date: 15:00:49 08/25/03

Go up one level in this thread


On August 25, 2003 at 17:35:09, Robert Hyatt wrote:

>On August 25, 2003 at 17:22:52, Sune Fischer wrote:
>
>>On August 25, 2003 at 16:50:08, Dan Andersson wrote:
>>
>>> The issue is the same. Because you can't guarantee that copying will be in
>>>cache. And you can't guarantee that other data structures won't be close or
>>>aligned in such a way that it won't trash the cache. The impact might not be
>>>great but it will be there. So the net cache bandwidth will be lower or even
>>>much lower than the simple linear relationship. Thus the slow main memory
>>>bottleneck will appear.
>>
>>For the whole picture goes, probably yes, but it is not easy to figure that
>>since there are many factors.
>>
>>I know this is down to hair splitting now, but IMO the reason that unmaking is
>>faster than uncopying isn't the one Bob gave, and I quote:
>>
>>"
>>>>I was thinking more about how silly it is to copy the empty bitboards for each
>>>>ply. If you update the boards that are active, they will stay in the cache.
>>>>Those that are not used might drop out, unless they are copied once every micro
>>>>second.
>>>>
>>>
>>>That is a reasonable rate for a program that searches 1M nodes per second.  I'm
>>>going at 2.4M so make that about once every 400 nanoseconds.  :)  Suddenly it
>>>begins to add up in a big way.  :)"
>>
>>As though the 2.4 Mnps was the reason.
>
>No.  the 2.4M nps simply gives a frequency, roughly 400ns.  Which is _my_
>programs frequency on my dual 2.8ghz box.  That gives me a _specific_ time
>per node, and it is pretty easy to estimate that copy/make is going to be
>a significant part of that...

I'm not fond of this way of thinking, you don't actually do one node per 400 ns,
you do two nodes per 800 ns.

For one thing you method gets confusing if you want to e.g. count clocks per
node.

>I was not saying that 2.4M nodes per second is the reason it fails for me,
>particularly.  I simply said that I search a node per 400+ ns, which means
>I have to do a copy/make every 400+ ns.  That's a lot of bandwidth.  That the
>PC doesn't really have.

But you have a "double" PC when talking cache bandwith...

>The dual actually makes this worse than a single cpu, as I said, due to two
>caches, snooping writes, and invalidating things in their own cache that the
>other processor just modified in the other cache.

That is true, it is all _very_ confusing.. :)

>It was about crafty and copy/make.  As I said, if I run at 2.4M nodes per
>second, I have to do a copy/make every 400 ns.  Whether I have one processor
>or 1024 processors, that won't change.

Well if you have 1024 processors each processor only has to do 1/1024 of the
work on the numbers you post, so I'd say it does chance something.

> And, in fact, on the dual it is harder
>to do that than on a single because of snooping.

yes, I do understand that :)

>>
>>He gave numbers of 25%, nobody can confirm those numbers (I get ~10%), but I
>>figure now that he was talking 25% in SMP search, or what?
>
>No.  Crafty version 9 was not SMP.  The first SMP version was 15.0.  The
>copy/make was dropped in version 9, and it produced a 25% speedup, no more,
>no less.  For Crafty specifically.  That's all I can say with any certainty.
>And I don't claim it is 25% for _other_ programs.  Only that it was 25% for
>Crafty, and that was the _only_ change to the program.  Going from Copy/Make to
>Make/Unmake.

If this is true, I think you must have been trashing cache badly.

>I don't think there are _any_ differences between SMP and non-SMP in this
>regard, other than possibly SMP is worse, rather than being better as you
>suggested (dual caches, etc).

Yes it didn't quite ended up sounding like I meant it :)

-S.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.