Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Intel four-way 2.8 Ghz system is just Amazing ! - Not hardly

Author: Eugene Nalimov

Date: 10:54:07 11/12/03

Go up one level in this thread


On November 12, 2003 at 11:55:20, Gian-Carlo Pascutto wrote:

>On November 11, 2003 at 23:42:45, Eugene Nalimov wrote:
>
>>My point is: it's possible that due to the fact that quad Opteron is NUMA -- >not SMP -- system, for SMP-only program performance on quad Opteron can be
>>worse than on *real* quad SMP system, even when for one CPU Opteron
>>performance is much better. Itanium was used only as an example of such
>>system, I never recommended rewriting any program for it.
>
>I don't understand how. The NUMA part is RAM. Even worst case on the Opteron
>RAM is faster than Xeon SMP. So how could it ever be worse?
>
>--
>GCP

I can think of several reasons why scaling is very bad if all the memory was
allocated at one CPU:

(1) Memory *bandwidth*. All the memory requests go to exactly that CPU, so all
CPUs have to use exactly one (or two) channels to memory. On Xeons *worst case*
memory bandwidth is higher.

(2) CPU-to-CPU *bandwidth* -- memory transfer speed is limited by the fact that
*one* CPU has to process memory requests for for *all* CPUs. Also notice that
for "normal" topology

  0----1
  |    |
  |    |
  2----3

CPU#3 has to go through either CPU#1 or CPU#2 to reach memory of CPU#0.

(3) MOESI vs. MESI synchronisation protocols -- I was told that on MOESI (used
by AMD) traffic due to shared *modified* cache lines is much higher than on MESI
(used by Intel). If it is really so (I didn't investigated myself) it probably
can explain why on 32-bit Athlons Crafty prior to 19.5 scaled worse than on
Pentium 4.

In any case here are results of Crafty 19.4 scaling on 2 different Opteron
systems, and on Itanium2 system (measured before Crafty became NUMA-aware, and
we decreased amount of shared modifiable data):

Opteron system I:
2 CPUs:    1.57x
3 CPUs:    1.99x
4 CPUs:    1.98x

Opteron system II:
2 CPUs:    1.61x
3 CPUs:    2.13x
4 CPUs:    2.35x

Itanium2 system:
2 CPUs:    1.84x
3 CPUs:    2.63x
4 CPUs:    3.22x

Crafty 19.5 scales much better. On Opteron system II it reaches 3.8x on 4P.

Thanks,
Eugene



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.