Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Robert Hyatt

Date: 19:50:21 09/02/03

Go up one level in this thread


On September 02, 2003 at 18:22:25, Jeremiah Penery wrote:

>On September 02, 2003 at 11:06:18, Robert Hyatt wrote:
>
>>On September 02, 2003 at 00:02:34, Jeremiah Penery wrote:
>>
>>>On September 01, 2003 at 23:39:13, Robert Hyatt wrote:
>>>
>>>>On August 29, 2003 at 18:32:46, Jeremiah Penery wrote:
>>>>
>>>>>Of course you can do a lot better - all I'm saying is that there's no way you're
>>>>>going to be doing worse.
>>>>
>>>>I don't remember saying I would be doing worse.  I remember saying I would
>>>>be doing _bad_.  Because potentially all memory references would be non-local.
>>>
>>>If you'd be doing "_bad_" in that case, how would you say you're doing now with
>>>SMP, where _every_ access is *slower than worst case* on that Opteron machine?
>>
>>
>>Somehow we are experiencing "a failure to communicate" (cool hand luke quote).
>>
>>My first port to the Cray resulted in a program that ran at 1K nodes per
>>second in 1981.  The previous machine was doing about 100 nodes per second,
>>so that was a gain.  On that same machine, 5-6 years later we were doing
>>20K nodes per second.
>>
>>I'd call 1K _BAD_.
>>
>>Even though it was faster than we had gone previously.
>
>That's a much different situation, because you moved to a completely different
>architecture with much more possibility.  You're not going to see a 20x speedup
>on Opteron for Crafty just by optimizing some memory accesses.  You'd be lucky
>to get 20%, IMO.
>
>If I somehow knew that a 4x Opteron 2GHz machine could get 20M NPS in Crafty,
>and you were getting only 3M, I'd agree that it was bad.  But that is very far
>from the actual case.


I have no real opteron numbers yet.  But let me make up a couple of numbers
just for discussion.

local memory = 80ns (I hope, but don't expect to see that.)
memory one hop away = 110ns.  and I assume a max of 1 hop for small
numbers of processors (2 for example).

If I put a split block in local memory, it will run at a latency of 80ns.  But
the current code puts them all in one memory, so one processor runs at 80ns
less collision loss, and the other runs at 110ns (again less collision loss).
That is a significant loss.  Go to 4 processors, which I use all the time, and
it gets worse.  No, it isn't a factor of 2.  But as I said, I scrap for every
2% speedup I can find.  for 20-30-40% I would jump through _many_ hoops...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.