Author: gerold daniels
Date: 17:09:36 11/09/05
Go up one level in this thread
On November 09, 2005 at 13:48:13, Robert Hyatt wrote: >On November 08, 2005 at 22:25:15, Vincent Diepeveen wrote: > >>On November 08, 2005 at 16:23:22, Gerd Isenberg wrote: >> >>>On November 08, 2005 at 15:26:45, Vincent Diepeveen wrote: >>> >>>>On November 08, 2005 at 13:17:42, Yar wrote: >>>> >>>>>Hello, >>>>> >>>>>Here is review (total 14 pages) of upcoming Intel's Xeon 5000 (Dempsey). Sorry >>>>>its only in german. It seems its faster then Opeteron 280. >>>>>http://www.tecchannel.de/server/hardware/432957/ >>>>> >>>>>With best regards, >>>>> >>>>>Yar >>>> >>>>It should be a fast cpu that Dempsey. However that Xeon will be there januari >>>>2007 or so and it will have a price of i guess around 5000 euro a cpu in the >>>>quad version, if you can get it for that, as you'll have to buy probably >>>>1000 at a time to get them for around 4500 dollar a piece. >>>> >>>>So effectively a quad xeon dual core will be januari 2007 around $40k. >>>> >>>>By that time of course a quad opteron quad core is nearly 2 times faster >>>>and exactly 2 times cheaper. >>>> >>>>Please note that it's not sure whether the IPC from the intel pentium-m at such >>>>high clockspeeds and dual core will be better than from AMD. I'm counting at it >>>>that it will be a lot slower, because in order to clock pentium-m higher, intel >>>>will need to make the pipeline longer and will probably move from a 2 cycle L1 >>>>to a 3 cycle L1. In which case the processor is similar to the opteron from >>>>chessprogramming viewpoint. >>>> >>>>Of course the Xeons have bigger L2 or even L3 caches on chip than AMD. That's >>>>nice for certain applications that are in benchmarks, but in reallife it's not a >>>>huge advantage. >>>> >>>>A few MB's is plenty for computerchess at the moment. >>>> >>>>On the other hand, could you tell me whether this Xeon has an on die memory >>>>controller or doesn't it have one? >>>> >>>>Because *that* matters a lot. Hashtables is a matter of TLB trashing memory >>>>latencies to a big hashtable. With 64 bits cpu's and the clock that keeps >>>>ticking, the RAM sizes will increase too, meaning that the latencies you lose to >>>>TLB trashing (transpositiontable , eval table, not so much pawntable as that'll >>>>be in L2 cache for majority of accesses) are significant. >>>> >>>>If intel plans to do that via some sort of chipset off chip, then that is a huge >>>>drawback of this Xeon cpu for databases and chess. At database benchmarks, using >>>>some small database they can get away with a big L2/L3 then, but in real life >>>>there is no escape there. It's just dead slow. >>>> >>>>So i do look forward to pentium-m, but the price at which intel usually sells >>>>good cpu's doesn't mean that we will see more quads online. >>>> >>>>Vincent >>> >>> >>>Yes, memory latency seems worse. >>> >>>OTOH intel has more than two times better bandwith using 128-bit SSE2/3 >>>load/store instructions, which is of course not so important for cumputer chess. >>> >>>Cache/Speicher: 128-Bit-Transfer >>>Bandwidth in MByte/s >>> >>> Dempsey Paxville Opteron 280 >>>L1 47340 41444 18360 >>>L2 24928 22105 9448 >>>Memory 3606 4127 3316 >> >>That's of course just paper. >> >>First of all at a quad machine, 8 cores at intel must share 3GB memory >>bandwidth, which is *theoretic* bandwidth. >> >>This where 8 cores at quad opteron have 4 memory controllers. So that's a factor >>4 advantage to opteron there in memory bandwidth. >> >>I didn't read bandwidth specs from L1&L2 cache of the intel chips. >> >>May i remind you that they had similar big heaven predictions for the P4 in the >>past. It would have a 2 cycle L1 cache bla bla. >> >>Prescott actually has a 4 cycle L1 cache. >> >>P4 would execute 4 instructions a cycle, because of having 2 doubled clocked >>integer units. >> >>Its practical limitations actually limit it to 3 instructions a cycle, and >>nearly no one can get that, thanks to other limitations. >> >>We should all use CMOV constructs says intel, to avoid branch mispredictions. >> >>Actually their own compiler doesn't generate them when using P4 switches, >>because at prescott a CMOV is at least 7 cycles penalty, versus 2 for AMD. >> >>So you can quote anything on paper here. The reality can be expressed in money >>very easily. >> >>That's that those Xeon chips can never compete in terms of price against quad >>core opterons, which will be on the market long before the DUAL core Xeon is >>there. >> >>How can 4 cores of AMD ever be slower than 2 from intel. >> >>If you plan to stream for example SSE2 to processors executing all kind of code, >>then obviously 4 cores of AMD always will win from 2 cores of intel. >> >>Especially if the AMD ones can run already for months when the intels still are >>in the factory on a paper sheet. >> >>Please realize in terms of bandwidth for gflop calculations that memory is the >>bottleneck. If 4 cpu's (8 cores) from intel can get at most 3 gigabyte a second, >>then obviously AMD will always win when they can stream 12 gigabyte a second to >>it. >> >>When on paper intel can receive 3.6GB and AMD on paper can receive 3.3GB a >>second, that's not real relevant. >> >>It's 1 memory controller for Intel, versus 4 for AMD. > > >That is if you have four processors. But the dual cores are sharing one >controller, and the dual cores most definitely compete for that one >hypertransport interface also since the multiple cache controllers and processor >cores place a high demand on a single path (per chip, not per core) to memory. >And then there is the issue of NUMA memory, which also reduces that high >theoretical AMD bandwidth significantly... > >The dual-core chips are _not_ as good as two single-core chips, based on lots of >benchmarking. They are very good, don't get me wrong, but the shared >hypertransport means 2x the traffic thru one external interface, which can and >does produce a bottleneck.. > Thanks for clearing this up Robert. > > >> >>Now for games that are multithreaded and SSE2 calculations like in all kind of >>graphics and such, that memory perfomance is a big performance hit. >> >>Additional, bandwidth in L1 cache for chess will be dominated by the LATENCY >>that getting a single doubleword out of L1 eats and the number of reads you can >>do simultaneously there. >> >>I remember the optimistic specs from the past from intel. They were not true. >> >>What will be the achillesheels this time? >> >>If there isn't, it's a killer cpu in that case for software that doesn't need >>RAM! >> >>If there is again achillesheels, intel has a major problem then. >> >>But i do realize the price of those cpu's. Just look to the size of the L2 >>cache! >> >>What was it 16MB or something? >> >>That's not gonna be CHEAP. > >Would not speculate there. As FAB sizes go down, transistor count goes up, with >no increase in cost at all. It used to be "how can we squeeze all this stuff >(L1/L2/floating point/multiple pipes/etc) into this small number of >transistors?" It is now more of "what on earth can we use all these transistors >for. At 6 transistors per bit for SRAM, a megabyte requires 6M transistors, >which is chickenfeed... > > > >> >>So whatever its performance, it won't be able to compete against AMD in that >>sense. >> >>You wonder about SSE2 here. Well let me ask you, how many SSE2 execution units >>does it have? >> >>We know AMD has 2. >>P4 has 1. >> >>AMD completely outperforms P4 there. >> >>Why would this be different at a pentium-m at stereoids, can you give some >>explanations? >> >>Let me give counter arguments. >> >>a) it will have an utmost TINY L1 cache >>b) it will SHARE the L2 cache, so it has a DEAD SLOW L2 cache in terms of >>latency. > >Shared L2 is not necessarily bad. On AMD the MOESI traffic can get _very_ high >if the two processor cores are modifying data that is shared... > > >>c) intel has the habit to try to get away with a very cheap L1 cache too, and >>just make 1 port in it. AMD had aready at K7 2 ports and so has K8. >> >>What will intel do this time to keep this cpu a cheap cpu to produce, meanwhile >>asking golden coins when you buy it? >> >>Who knows, perhaps intel has some good cpu now? >> >>Let's hope so. >> >>>Also general SSE-performance is much better for the future intels. >>>Hopefully some motivation for amd to work on 128-bit alus ;-) >>>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.