Author: Vincent Diepeveen
Date: 17:02:10 04/28/05
Go up one level in this thread
On April 28, 2005 at 13:51:02, Robert Hyatt wrote: >On April 28, 2005 at 07:52:27, Vincent Diepeveen wrote: > >> http://www.sudhian.com/showdocs.cfm?aid=667&pid=2543 >> >>In world champs 2004 at quad opteron 2.0Ghz diep got about 380k nps in >>openingsposition. >> >>529k nps at dual core dual opteron. It's just incredible. Scaling 3.92 >>Quad opteron world champs 2004 scaled 3.93 >> >>This was a net2003 executable compiled in 32 bits. Intel c++ compiles were >>slower. > > >I think their data or test is a bit broken. I have dual PIV's here, both my Ah the usual P4 confusion deliberately done by intel. Your old P4-Xeon is still the good xeons with a 2 cycle L1 cache at a 100 watt power budget. The prescott/nocona and also these sucking dual core P4s are having a 4 cycle L1 cache. 2 times slower. This is called a P4EE dual core. In fact the real old P4EE which runs single core only to 3.47Ghz is having a 2 cycle L1, but these new "P4EE" dual core have a 4 cycle L1. Additional they can do all just 1 read simultaneously and there is another X disadvantages to the new cores, just to keep within power budget. Such single core cpu's are already roughly against or just over 100 watt and the dual core P4 can't be produced at 200 watt budget of course but also has to keep within the more or less 100 watts, meanwhile they want to produce them cheap (high yields). So that's why these new P4 prescotts and dual core P4's suck so much, to say it very very polite. We didn't discuss the real major L1/L2 <==> memory bandwidth problem that the dual core P4 gets as a result of SMT and lack of power budget. From performance viewpoint intel is ancient history. the old P4 cores are way faster than the new ones for majority of software. Because 1 core is so much slower and the available bandwidth of just 1 cpu must get shared by 4 logical processors now, that means massive problems. With just a little bit of hardware papermath you can prove that to yourself very easily. Even then 188k nps is not too bad for 4 dead slow logical processors all sharing 1 poor memory controller. We already knew that the new prescott 3.2Ghz was just above 90k nps, so getting double the speed now with a dual core P4 is about what we could expect. Only when it gets compared to a dual core dual opteron the real difference gets made clear. We should instead ask ourselves questions more like, what the speedup out of the total nps is that 8 logical cores give at a dual xeon dual P4. Well we get answer to that in 2006 i guess. Don't wait for it. >xeon, and a dual 3.6 in the lab. My scaling from 1 to 2 threads (no >hyper-threading turned on) is way better than 1.67X using 2 threads. > >For example, on my dual xeon, a quick test produced 1.51M nps using a single >processor, 2.89M using both. Which is a 1.91X increase in NPS. > >On a 4-way 850, I have large tests that produce 2.32M for one cpu, 8.69M for >four cpus, which is 3.75X faster. Comparing apples to apples, the dual 850 NPS >was 4.53M, which turns into 1.95X faster. So for me, the dual xeon and the dual You compare 64 bits opteron with 32 bits intel. That's not a very fair compare IMHO, even though i'm sure you are not interested in knowing intels 64 bits speed. It will scale bad of course. >opteron produce almost identical NPS scaling, although the opteron is about 2x >faster per cpu... Please realize clearly the posted results are 32 bits and all those compilers are P4 compilers, they do not take advantage of things that are fast at opteron. Take for example all kind of conditional instructions. 2 cycles at opteron, 7+ at P4. In 64 bits the opteron gets more of a speedup than intel with its tiny slow caches does of course and conditional optimizations, not yet done by compilers as it is slow for P4, would speedup opteron even more for our branchy chessprograms. Intels real problem for computerchess is it's small, slow, 1 port L1 cache. Even though the huge branch misprediction penalty isn't very helpful either, and replacing it by conditional move type instructions isn't very easy either as in general that's dead slow at it too, even that is still not the real big problem of it. Knowing intels plans to modify the pentium-m a bit in order to release it for desktop market are not very impressive either. It also has a 2 times smaller L1 cache than opteron. So they can put all kind of jingles and christmas bells at it and get a bigger power budget, but it's not very promising for computerchess either. Especially if we realize it won't be ready to be presented before end of 2006. That said, we realize that means it gets sold in 2007 somewhere. Intel has simply lost the performance battle until start of 2007 for desktopprocessors and objective comparisions like these show it real clear. Even if intel comes by end 2006 with a good replacement for the P4, by then every normal computer user will realize that intel sucks and AMD is fast, so the damage intel has done to itself in the desktop market is one of long term. The performance difference is just too huge and will only get bigger thanks to all kind of future problems intel will have the coming 2 years. To give a few examples: - 64 bits will help AMD more - when compilers improve and optimize better for AMD than they do now there will be even more speedwin - when trying to clock higher, amd still can clock higher, intel already is far over any decent wattage a processor can have In short, only when intel would manage to create some sort of cell type processor that is also fast for integer work loads (ibm/sony's cell processor doesn't have branch prediction for example which is kind of harakiri for chessprogs), only in that case intel makes a chance to come back in 2007. In all other cases they better start firing 60% of their personnel. Starting with the philisopher that caused intel to be the ONLY manufacturer with tiny L1 caches. However if intel would have such a project currently that develops itself in a positive way, they would already brag about it in all newspapers. Look at ibm, sun, amd, transmeta etc. They all have WAY bigger L1 caches than intels itanium and p4 have. pentium-m with 64KB L1 being in the middle of the show. Oh for those interested, the original expectation was that the dual core xeon MP's/ dual core opteron 8xx's will be in the shops by Q1 2006. I'm not going to wait for that; i've already thrown away all "intel inside" stickers here.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.