Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is Xeon 5000 (Dempsey) with FB-DIMMs faster then Opteron 280 ?

Author: Vincent Diepeveen

Date: 19:25:15 11/08/05

Go up one level in this thread


On November 08, 2005 at 16:23:22, Gerd Isenberg wrote:

>On November 08, 2005 at 15:26:45, Vincent Diepeveen wrote:
>
>>On November 08, 2005 at 13:17:42, Yar wrote:
>>
>>>Hello,
>>>
>>>Here is review (total 14 pages) of upcoming Intel's Xeon 5000 (Dempsey). Sorry
>>>its only in german. It seems its faster then Opeteron 280.
>>>http://www.tecchannel.de/server/hardware/432957/
>>>
>>>With best regards,
>>>
>>>Yar
>>
>>It should be a fast cpu that Dempsey. However that Xeon will be there januari
>>2007 or so and it will have a price of i guess around 5000 euro a cpu in the
>>quad version, if you can get it for that, as you'll have to buy probably
>>1000 at a time to get them for around 4500 dollar a piece.
>>
>>So effectively a quad xeon dual core will be januari 2007 around $40k.
>>
>>By that time of course a quad opteron quad core is nearly 2 times faster
>>and exactly 2 times cheaper.
>>
>>Please note that it's not sure whether the IPC from the intel pentium-m at such
>>high clockspeeds and dual core will be better than from AMD. I'm counting at it
>>that it will be a lot slower, because in order to clock pentium-m higher, intel
>>will need to make the pipeline longer and will probably  move from a 2 cycle L1
>>to a 3 cycle L1. In which case the processor is similar to the opteron from
>>chessprogramming viewpoint.
>>
>>Of course the Xeons have bigger L2 or even L3 caches on chip than AMD. That's
>>nice for certain applications that are in benchmarks, but in reallife it's not a
>>huge advantage.
>>
>>A few MB's is plenty for computerchess at the moment.
>>
>>On the other hand, could you tell me whether this Xeon has an on die memory
>>controller or doesn't it have one?
>>
>>Because *that* matters a lot. Hashtables is a matter of TLB trashing memory
>>latencies to a big hashtable. With 64 bits cpu's and the clock that keeps
>>ticking, the RAM sizes will increase too, meaning that the latencies you lose to
>>TLB trashing (transpositiontable , eval table, not so much pawntable as that'll
>>be in L2 cache for majority of accesses) are significant.
>>
>>If intel plans to do that via some sort of chipset off chip, then that is a huge
>>drawback of this Xeon cpu for databases and chess. At database benchmarks, using
>>some small database they can get away with a big L2/L3 then, but in real life
>>there is no escape there. It's just dead slow.
>>
>>So i do look forward to pentium-m, but the price at which intel usually sells
>>good cpu's doesn't mean that we will see more quads online.
>>
>>Vincent
>
>
>Yes, memory latency seems worse.
>
>OTOH intel has more than two times better bandwith using 128-bit SSE2/3
>load/store instructions, which is of course not so important for cumputer chess.
>
>Cache/Speicher: 128-Bit-Transfer
>Bandwidth in MByte/s
>
>           Dempsey Paxville Opteron 280
>L1          47340    41444    18360
>L2          24928    22105     9448
>Memory       3606     4127     3316

That's of course just paper.

First of all at a quad machine, 8 cores at intel must share 3GB memory
bandwidth, which is *theoretic* bandwidth.

This where 8 cores at quad opteron have 4 memory controllers. So that's a factor
4 advantage to opteron there in memory bandwidth.

I didn't read bandwidth specs from L1&L2 cache of the intel chips.

May i remind you that they had similar big heaven predictions for the P4 in the
past. It would have a 2 cycle L1 cache bla bla.

Prescott actually has a 4 cycle L1 cache.

P4 would execute 4 instructions a cycle, because of having 2 doubled clocked
integer units.

Its practical limitations actually limit it to 3 instructions a cycle, and
nearly no one can get that, thanks to other limitations.

We should all use CMOV constructs says intel, to avoid branch mispredictions.

Actually their own compiler doesn't generate them when using P4 switches,
because at prescott a CMOV is at least 7 cycles penalty, versus 2 for AMD.

So you can quote anything on paper here. The reality can be expressed in money
very easily.

That's that those Xeon chips can never compete in terms of price against quad
core opterons, which will be on the market long before the DUAL core Xeon is
there.

How can 4 cores of AMD ever be slower than 2 from intel.

If you plan to stream for example SSE2 to processors executing all kind of code,
then obviously 4 cores of AMD always will win from 2 cores of intel.

Especially if the AMD ones can run already for months when the intels still are
in the factory on a paper sheet.

Please realize in terms of bandwidth for gflop calculations that memory is the
bottleneck. If 4 cpu's (8 cores) from intel can get at most 3 gigabyte a second,
then obviously AMD will always win when they can stream 12 gigabyte a second to
it.

When on paper intel can receive 3.6GB and AMD on paper can receive 3.3GB a
second, that's not real relevant.

It's 1 memory controller for Intel, versus 4 for AMD.

Now for games that are multithreaded and SSE2 calculations like in all kind of
graphics and such, that memory perfomance is a big performance hit.

Additional, bandwidth in L1 cache for chess will be dominated by the LATENCY
that getting a single doubleword out of L1 eats and the number of reads you can
do simultaneously there.

I remember the optimistic specs from the past from intel. They were not true.

What will be the achillesheels this time?

If there isn't, it's a killer cpu in that case for software that doesn't need
RAM!

If there is again achillesheels, intel has a major problem then.

But i do realize the price of those cpu's. Just look to the size of the L2
cache!

What was it 16MB or something?

That's not gonna be CHEAP.

So whatever its performance, it won't be able to compete against AMD in that
sense.

You wonder about SSE2 here. Well let me ask you, how many SSE2 execution units
does it have?

We know AMD has 2.
P4 has 1.

AMD completely outperforms P4 there.

Why would this be different at a pentium-m at stereoids, can you give some
explanations?

Let me give counter arguments.

a) it will have an utmost TINY L1 cache
b) it will SHARE the L2 cache, so it has a DEAD SLOW L2 cache in terms of
latency.
c) intel has the habit to try to get away with a very cheap L1 cache too, and
just make 1 port in it. AMD had aready at K7 2 ports and so has K8.

What will intel do this time to keep this cpu a cheap cpu to produce, meanwhile
asking golden coins when you buy it?

Who knows, perhaps intel has some good cpu now?

Let's hope so.

>Also general SSE-performance is much better for the future intels.
>Hopefully some motivation for amd to work on 128-bit alus ;-)
>Gerd




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.