Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Intel four-way 2.8 Ghz system is just Amazing ! - Not hardly

Author: Robert Hyatt
Date: 07:30:55 11/13/03
On November 13, 2003 at 01:24:56, Aaron Gordon wrote:

>On November 12, 2003 at 13:37:55, Robert Hyatt wrote:
>
>>On November 11, 2003 at 19:58:36, Aaron Gordon wrote:
>>
>>>On November 11, 2003 at 17:29:25, Richard Pijl wrote:
>>>
>>>>On November 11, 2003 at 08:50:53, Aaron Gordon wrote:
>>>>
>>>>>It would be better had they used a quad or 8-way Opteron running 2GHz or more.
>>>>>From some testing I've done in the past you can figure a single Opteron 2GHz ==
>>>>>a P4-3.6GHz in Fritz 8 (32bit mode). So, a Quad Opteron 2.0 == Quad P4-3.6.
>>>>>Almost 30% faster, plus the memory bandwidth available would probably push it a
>>>>>bit over that with large hash table sizes. 8-way Opteron 2.0 would of course be
>>>>>like 8 p4-3.6's (however with some 40gb/s+ memory bandwidth available depending
>>>>>on bus speed).
>>>>>
>>>>>Why not use the best hardware? Seems like if you'd want to promote your new
>>>>>'awesome' chess program you'd want to give it the best chance of winning.
>>>>
>>>>As far as I know Fritz is written (mainly) in assembly. So using the additional
>>>>registers and so on is probably not feasible. Would Opteron be much faster in
>>>>this case too?
>>>>
>>>>Richard
>>>
>>>About programming it specifically for the Opteron.. I have no idea. The Opteron
>>>is about 25-30% faster than an Athlon XP in Deep Fritz 7 / Fritz 8 though
>>>without any code modification.
>>>
>>>I have high hopes for hand optimizations for the Opteron. We'll have to see how
>>>it turns out though. Hyatt, Christophe, and those guys could definitely give you
>>>more insight on this matter than I could.
>>
>>
>>I believe the main problem with the quad opteron is the cache coherency
>>mechanism (MOESI) that AMD uses.  Intel doesn't.  In a NUMA architecture,
>>that takes a performance hit if the application doesn't try to minimize
>>using the same cache line (or the same group of memory words) on multiple
>>CPUs at the same time.  That's the main change I made when doing this NUMA
>>conversion for the older NUMA alpha box I reported on last year, and we did
>>the _same_ thing again to get the performance on the Opteron back up to a
>>reasonable level.
>
>Any possible way to detect the necessary information and have it dynamically
>configure it as such? Seems like you could then not have to worry about the SMP
>problem creeping up with other processors play w/ Crafty. I don't know what is
>involved in this process as I have little programming experience.. figured I'd
>bring it up anyway. If it can be done it seems it would be worth while.

I think I have already done this.  One particular thing to avoid is a single
cache line that had a value that is constantly read _and_ modified by all
threads.  MOESI continually produces a lot of "chatter" in this case and
the cache line is almost always marked "invalid".  That was the biggest change
we had to fix, and it required making that global variable "thread-local" to
solve the problem.

This is the main thing that helped my dual Xeon start to scale properly, because
the PIV now uses 128 byte cache lines and I was smoking that one cache line
between two cpus.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.