Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Is Xeon 5000 (Dempsey) with FB-DIMMs faster then Opteron 280 ?

Author: Vincent Diepeveen
Date: 18:00:19 11/09/05
On November 09, 2005 at 13:48:13, Robert Hyatt wrote:

>On November 08, 2005 at 22:25:15, Vincent Diepeveen wrote:
>
>>On November 08, 2005 at 16:23:22, Gerd Isenberg wrote:
>>
>>>On November 08, 2005 at 15:26:45, Vincent Diepeveen wrote:
>>>
>>>>On November 08, 2005 at 13:17:42, Yar wrote:
>>>>
>>>>>Hello,
>>>>>
>>>>>Here is review (total 14 pages) of upcoming Intel's Xeon 5000 (Dempsey). Sorry
>>>>>its only in german. It seems its faster then Opeteron 280.
>>>>>http://www.tecchannel.de/server/hardware/432957/
>>>>>
>>>>>With best regards,
>>>>>
>>>>>Yar
>>>>
>>>>It should be a fast cpu that Dempsey. However that Xeon will be there januari
>>>>2007 or so and it will have a price of i guess around 5000 euro a cpu in the
>>>>quad version, if you can get it for that, as you'll have to buy probably
>>>>1000 at a time to get them for around 4500 dollar a piece.
>>>>
>>>>So effectively a quad xeon dual core will be januari 2007 around $40k.
>>>>
>>>>By that time of course a quad opteron quad core is nearly 2 times faster
>>>>and exactly 2 times cheaper.
>>>>
>>>>Please note that it's not sure whether the IPC from the intel pentium-m at such
>>>>high clockspeeds and dual core will be better than from AMD. I'm counting at it
>>>>that it will be a lot slower, because in order to clock pentium-m higher, intel
>>>>will need to make the pipeline longer and will probably  move from a 2 cycle L1
>>>>to a 3 cycle L1. In which case the processor is similar to the opteron from
>>>>chessprogramming viewpoint.
>>>>
>>>>Of course the Xeons have bigger L2 or even L3 caches on chip than AMD. That's
>>>>nice for certain applications that are in benchmarks, but in reallife it's not a
>>>>huge advantage.
>>>>
>>>>A few MB's is plenty for computerchess at the moment.
>>>>
>>>>On the other hand, could you tell me whether this Xeon has an on die memory
>>>>controller or doesn't it have one?
>>>>
>>>>Because *that* matters a lot. Hashtables is a matter of TLB trashing memory
>>>>latencies to a big hashtable. With 64 bits cpu's and the clock that keeps
>>>>ticking, the RAM sizes will increase too, meaning that the latencies you lose to
>>>>TLB trashing (transpositiontable , eval table, not so much pawntable as that'll
>>>>be in L2 cache for majority of accesses) are significant.
>>>>
>>>>If intel plans to do that via some sort of chipset off chip, then that is a huge
>>>>drawback of this Xeon cpu for databases and chess. At database benchmarks, using
>>>>some small database they can get away with a big L2/L3 then, but in real life
>>>>there is no escape there. It's just dead slow.
>>>>
>>>>So i do look forward to pentium-m, but the price at which intel usually sells
>>>>good cpu's doesn't mean that we will see more quads online.
>>>>
>>>>Vincent
>>>
>>>
>>>Yes, memory latency seems worse.
>>>
>>>OTOH intel has more than two times better bandwith using 128-bit SSE2/3
>>>load/store instructions, which is of course not so important for cumputer chess.
>>>
>>>Cache/Speicher: 128-Bit-Transfer
>>>Bandwidth in MByte/s
>>>
>>>           Dempsey Paxville Opteron 280
>>>L1          47340    41444    18360
>>>L2          24928    22105     9448
>>>Memory       3606     4127     3316
>>
>>That's of course just paper.
>>
>>First of all at a quad machine, 8 cores at intel must share 3GB memory
>>bandwidth, which is *theoretic* bandwidth.
>>
>>This where 8 cores at quad opteron have 4 memory controllers. So that's a factor
>>4 advantage to opteron there in memory bandwidth.
>>
>>I didn't read bandwidth specs from L1&L2 cache of the intel chips.
>>
>>May i remind you that they had similar big heaven predictions for the P4 in the
>>past. It would have a 2 cycle L1 cache bla bla.
>>
>>Prescott actually has a 4 cycle L1 cache.
>>
>>P4 would execute 4 instructions a cycle, because of having 2 doubled clocked
>>integer units.
>>
>>Its practical limitations actually limit it to 3 instructions a cycle, and
>>nearly no one can get that, thanks to other limitations.
>>
>>We should all use CMOV constructs says intel, to avoid branch mispredictions.
>>
>>Actually their own compiler doesn't generate them when using P4 switches,
>>because at prescott a CMOV is at least 7 cycles penalty, versus 2 for AMD.
>>
>>So you can quote anything on paper here. The reality can be expressed in money
>>very easily.
>>
>>That's that those Xeon chips can never compete in terms of price against quad
>>core opterons, which will be on the market long before the DUAL core Xeon is
>>there.
>>
>>How can 4 cores of AMD ever be slower than 2 from intel.
>>
>>If you plan to stream for example SSE2 to processors executing all kind of code,
>>then obviously 4 cores of AMD always will win from 2 cores of intel.
>>
>>Especially if the AMD ones can run already for months when the intels still are
>>in the factory on a paper sheet.
>>
>>Please realize in terms of bandwidth for gflop calculations that memory is the
>>bottleneck. If 4 cpu's (8 cores) from intel can get at most 3 gigabyte a second,
>>then obviously AMD will always win when they can stream 12 gigabyte a second to
>>it.
>>
>>When on paper intel can receive 3.6GB and AMD on paper can receive 3.3GB a
>>second, that's not real relevant.
>>
>>It's 1 memory controller for Intel, versus 4 for AMD.
>
>
>That is if you have four processors.  But the dual cores are sharing one
>controller, and the dual cores most definitely compete for that one
>hypertransport interface also since the multiple cache controllers and processor
>cores place a high demand on a single path (per chip, not per core) to memory.
>And then there is the issue of NUMA memory, which also reduces that high
>theoretical AMD bandwidth significantly...
>
>The dual-core chips are _not_ as good as two single-core chips, based on lots of
>benchmarking.  They are very good, don't get me wrong, but the shared
>hypertransport means 2x the traffic thru one external interface, which can and
>does produce a bottleneck..

The math is very simple. A quad AMD has 4 memory controllers versus a quad xeon
has 1 memory controller.

Hypertransport is never the problem because even in its current small form that
already delivers 14.4GB/s. Expanding that to 28.8GB/s is pretty trivial. That's
simply a 2 times thicker hypertransport channel. So when there is RAM that
supports a bigger bandwidth, it's easy to scale that up.

So it's 4 memory controllers at a quad for AMD against 1 for intel.

Please note that even single cpu AMD's have a memory bandwidth of around 6.1GB/s
according to sciencemark 2.0

http://www.amdzone.com/modules.php?op=modload&name=Sections&file=index&req=printpage&artid=28

So whatever intel shows up with. Memorybandwidth and latency will be ultra weak
compared to AMD as long as intel doesn't use on die memory controllers.

Hypertransport is the key to success for AMD.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.