Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is Xeon 5000 (Dempsey) with FB-DIMMs faster then Opteron 280 ?

Author: Robert Hyatt

Date: 19:58:11 11/10/05

Go up one level in this thread


On November 10, 2005 at 20:40:57, Aaron Gordon wrote:

>On November 10, 2005 at 00:28:24, Robert Hyatt wrote:
>
>>>hypertransport isn't the bottleneck at all as that delivers hands down 14.4GB/s
>>>a channel.
>>
>>There is more to this than "bandwidth".  There is "latency" and there are
>>"conflicts"
>
>
>I don't know too much about the technical workings of new bus system AMD uses on
>the Athlon 64/Opteron type systems. Hopefully you could clear something up for
>me, I would appreciate it.
>
>I tested my Athlon 64 3700+ (2.2GHz, single cpu, single core) running at
>standard 200MHz bus with my HT multiplier at the normal 5 (5x200=1000MHz HT). I
>ran Crafty 19.15 64bit and got 2321134 nodes/second. I then set the HT
>multiplier down to 1x, resulting in an HT bus of 200MHz (5 times slower). I
>retested Crafty and got the exact same NPS as before.
>
>If a drastic reduction of the HT bus speed results in zero nps loss, my initial
>guess is that it would have almost no effect on Crafty whether the dual core (or
>cpu) setup has one or two HT links. If this is not the case and Crafty does run
>slower, why exactly does this happen?

You didn't test the HT system at all.  Here's why.

Crafty hits on L1/L2 cache and memory.  On a single cpu system, such as yours,
the hypertransport bus connects to other things like a PCI bridge and the like,
but it has nothing to do with memory or cache activity.  But on a dual CPU (not
dual core, but a machine with two opteron chips) the HT becomes critical.  Each
cpu can read/write its own local memory by directly talking to its memory
controller.  But for your dual CPU system, which is NUMA in opteron/AMD-speak,
1/2 of the memory is remote, and that requires HT to HT traffic to read data
from the other node.  More importantly, the two L1/L2 caches on a single
dual-core chip have to talk to the other two L1/L2 caches on a dual chip system,
and that also goes over the HT bus.  So you end up with a lot of traffic when
you add a second physical chip (whether the chip is single-core, dual-core or
quad-core doesn't really matter here, except that a dual core has two sets of
caches and a quad-core has 4, which means there is going to be a +lot+ of
traffic between the four local cores and the 12 remote cores on a 4-processor
machine with each processor being a quad-core chip.

So as you can see, for a single-cpu system, the HT bus is not used at all while
actually playing chess.  But on a 4x2 system as I used in the WCCC, it gets
heated up pretty well...

Hope that helps...

There are some pretty good pictures scattered around the net showing how the HT
links the processor to other things.  But for chess, the only thing that matters
is that those "other things" are other processors, and there the communication
speed is critical, and it becomes a bottleneck.  Of course, that is what NUMA is
all about.  A cost-effective interconnect that scales well in terms of cost, but
not as well in terms of performance...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.