Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is Xeon 5000 (Dempsey) with FB-DIMMs faster then Opteron 280 ?

Author: gerold daniels

Date: 17:09:36 11/09/05

Go up one level in this thread


On November 09, 2005 at 13:48:13, Robert Hyatt wrote:

>On November 08, 2005 at 22:25:15, Vincent Diepeveen wrote:
>
>>On November 08, 2005 at 16:23:22, Gerd Isenberg wrote:
>>
>>>On November 08, 2005 at 15:26:45, Vincent Diepeveen wrote:
>>>
>>>>On November 08, 2005 at 13:17:42, Yar wrote:
>>>>
>>>>>Hello,
>>>>>
>>>>>Here is review (total 14 pages) of upcoming Intel's Xeon 5000 (Dempsey). Sorry
>>>>>its only in german. It seems its faster then Opeteron 280.
>>>>>http://www.tecchannel.de/server/hardware/432957/
>>>>>
>>>>>With best regards,
>>>>>
>>>>>Yar
>>>>
>>>>It should be a fast cpu that Dempsey. However that Xeon will be there januari
>>>>2007 or so and it will have a price of i guess around 5000 euro a cpu in the
>>>>quad version, if you can get it for that, as you'll have to buy probably
>>>>1000 at a time to get them for around 4500 dollar a piece.
>>>>
>>>>So effectively a quad xeon dual core will be januari 2007 around $40k.
>>>>
>>>>By that time of course a quad opteron quad core is nearly 2 times faster
>>>>and exactly 2 times cheaper.
>>>>
>>>>Please note that it's not sure whether the IPC from the intel pentium-m at such
>>>>high clockspeeds and dual core will be better than from AMD. I'm counting at it
>>>>that it will be a lot slower, because in order to clock pentium-m higher, intel
>>>>will need to make the pipeline longer and will probably  move from a 2 cycle L1
>>>>to a 3 cycle L1. In which case the processor is similar to the opteron from
>>>>chessprogramming viewpoint.
>>>>
>>>>Of course the Xeons have bigger L2 or even L3 caches on chip than AMD. That's
>>>>nice for certain applications that are in benchmarks, but in reallife it's not a
>>>>huge advantage.
>>>>
>>>>A few MB's is plenty for computerchess at the moment.
>>>>
>>>>On the other hand, could you tell me whether this Xeon has an on die memory
>>>>controller or doesn't it have one?
>>>>
>>>>Because *that* matters a lot. Hashtables is a matter of TLB trashing memory
>>>>latencies to a big hashtable. With 64 bits cpu's and the clock that keeps
>>>>ticking, the RAM sizes will increase too, meaning that the latencies you lose to
>>>>TLB trashing (transpositiontable , eval table, not so much pawntable as that'll
>>>>be in L2 cache for majority of accesses) are significant.
>>>>
>>>>If intel plans to do that via some sort of chipset off chip, then that is a huge
>>>>drawback of this Xeon cpu for databases and chess. At database benchmarks, using
>>>>some small database they can get away with a big L2/L3 then, but in real life
>>>>there is no escape there. It's just dead slow.
>>>>
>>>>So i do look forward to pentium-m, but the price at which intel usually sells
>>>>good cpu's doesn't mean that we will see more quads online.
>>>>
>>>>Vincent
>>>
>>>
>>>Yes, memory latency seems worse.
>>>
>>>OTOH intel has more than two times better bandwith using 128-bit SSE2/3
>>>load/store instructions, which is of course not so important for cumputer chess.
>>>
>>>Cache/Speicher: 128-Bit-Transfer
>>>Bandwidth in MByte/s
>>>
>>>           Dempsey Paxville Opteron 280
>>>L1          47340    41444    18360
>>>L2          24928    22105     9448
>>>Memory       3606     4127     3316
>>
>>That's of course just paper.
>>
>>First of all at a quad machine, 8 cores at intel must share 3GB memory
>>bandwidth, which is *theoretic* bandwidth.
>>
>>This where 8 cores at quad opteron have 4 memory controllers. So that's a factor
>>4 advantage to opteron there in memory bandwidth.
>>
>>I didn't read bandwidth specs from L1&L2 cache of the intel chips.
>>
>>May i remind you that they had similar big heaven predictions for the P4 in the
>>past. It would have a 2 cycle L1 cache bla bla.
>>
>>Prescott actually has a 4 cycle L1 cache.
>>
>>P4 would execute 4 instructions a cycle, because of having 2 doubled clocked
>>integer units.
>>
>>Its practical limitations actually limit it to 3 instructions a cycle, and
>>nearly no one can get that, thanks to other limitations.
>>
>>We should all use CMOV constructs says intel, to avoid branch mispredictions.
>>
>>Actually their own compiler doesn't generate them when using P4 switches,
>>because at prescott a CMOV is at least 7 cycles penalty, versus 2 for AMD.
>>
>>So you can quote anything on paper here. The reality can be expressed in money
>>very easily.
>>
>>That's that those Xeon chips can never compete in terms of price against quad
>>core opterons, which will be on the market long before the DUAL core Xeon is
>>there.
>>
>>How can 4 cores of AMD ever be slower than 2 from intel.
>>
>>If you plan to stream for example SSE2 to processors executing all kind of code,
>>then obviously 4 cores of AMD always will win from 2 cores of intel.
>>
>>Especially if the AMD ones can run already for months when the intels still are
>>in the factory on a paper sheet.
>>
>>Please realize in terms of bandwidth for gflop calculations that memory is the
>>bottleneck. If 4 cpu's (8 cores) from intel can get at most 3 gigabyte a second,
>>then obviously AMD will always win when they can stream 12 gigabyte a second to
>>it.
>>
>>When on paper intel can receive 3.6GB and AMD on paper can receive 3.3GB a
>>second, that's not real relevant.
>>
>>It's 1 memory controller for Intel, versus 4 for AMD.
>
>
>That is if you have four processors.  But the dual cores are sharing one
>controller, and the dual cores most definitely compete for that one
>hypertransport interface also since the multiple cache controllers and processor
>cores place a high demand on a single path (per chip, not per core) to memory.
>And then there is the issue of NUMA memory, which also reduces that high
>theoretical AMD bandwidth significantly...
>
>The dual-core chips are _not_ as good as two single-core chips, based on lots of
>benchmarking.  They are very good, don't get me wrong, but the shared
>hypertransport means 2x the traffic thru one external interface, which can and
>does produce a bottleneck..
>

Thanks for clearing this up Robert.
>
>
>>
>>Now for games that are multithreaded and SSE2 calculations like in all kind of
>>graphics and such, that memory perfomance is a big performance hit.
>>
>>Additional, bandwidth in L1 cache for chess will be dominated by the LATENCY
>>that getting a single doubleword out of L1 eats and the number of reads you can
>>do simultaneously there.
>>
>>I remember the optimistic specs from the past from intel. They were not true.
>>
>>What will be the achillesheels this time?
>>
>>If there isn't, it's a killer cpu in that case for software that doesn't need
>>RAM!
>>
>>If there is again achillesheels, intel has a major problem then.
>>
>>But i do realize the price of those cpu's. Just look to the size of the L2
>>cache!
>>
>>What was it 16MB or something?
>>
>>That's not gonna be CHEAP.
>
>Would not speculate there.  As FAB sizes go down, transistor count goes up, with
>no increase in cost at all.  It used to be "how can we squeeze all this stuff
>(L1/L2/floating point/multiple pipes/etc) into this small number of
>transistors?"  It is now more of "what on earth can we use all these transistors
>for.  At 6 transistors per bit for SRAM, a megabyte requires 6M transistors,
>which is chickenfeed...
>
>
>
>>
>>So whatever its performance, it won't be able to compete against AMD in that
>>sense.
>>
>>You wonder about SSE2 here. Well let me ask you, how many SSE2 execution units
>>does it have?
>>
>>We know AMD has 2.
>>P4 has 1.
>>
>>AMD completely outperforms P4 there.
>>
>>Why would this be different at a pentium-m at stereoids, can you give some
>>explanations?
>>
>>Let me give counter arguments.
>>
>>a) it will have an utmost TINY L1 cache
>>b) it will SHARE the L2 cache, so it has a DEAD SLOW L2 cache in terms of
>>latency.
>
>Shared L2 is not necessarily bad.  On AMD the MOESI traffic can get _very_ high
>if the two processor cores are modifying data that is shared...
>
>
>>c) intel has the habit to try to get away with a very cheap L1 cache too, and
>>just make 1 port in it. AMD had aready at K7 2 ports and so has K8.
>>
>>What will intel do this time to keep this cpu a cheap cpu to produce, meanwhile
>>asking golden coins when you buy it?
>>
>>Who knows, perhaps intel has some good cpu now?
>>
>>Let's hope so.
>>
>>>Also general SSE-performance is much better for the future intels.
>>>Hopefully some motivation for amd to work on 128-bit alus ;-)
>>>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.