Author: Matt Taylor
Date: 08:59:48 12/17/02
Go up one level in this thread
On December 17, 2002 at 11:33:36, Vincent Diepeveen wrote: >On December 17, 2002 at 11:27:18, Matt Taylor wrote: > >>On December 17, 2002 at 10:10:46, Vincent Diepeveen wrote: >> >>>Hello, >>> >>>Some tests were performed in the USA, where some P4 Xeon dual 2.8Ghz >>>systems get delivered now. In Europe we can't get them yet and >>>most likely we don't want them either: >>> >>>Here are the results of DIEP at the Xeon 2.8Ghz dual ECC registered DDR ram. >>> >>>test 1: diep 4 processes. Of course HT enabled. >>> 181538 nps >>> >>>test 2: diep 2 processes. HT enabled. >>> 135924 nps >>> >>>test 3: diep 2 processes K7 1.6ghz (registered DDR ram all other settings >>> identical to xeon dual setup): >>> 146555 >>> >>>THE 2 TESTS NALIMOV DIDN'T OR COULDN'T WANT TO DO WITH CRAFTY >>>SOME WEEKS AGO REVEAL A BIG WEAKNESS OF HT/SMT: >>> >>>test 4: diep 2 processes. HT disabled. 171288 nps >>> >>>test 5 and 6: diep single cpu HT disabled and enabled were same speed >>> 92090 nps versus 92019 nps. >> >>Crafty gets better results with HT, but it's been optimized for HT. It just > >That hasn't been proven yet. > >there was no test done without HT and 2 processors as far as i know. > >Please read how i tested it. I'm pretty sure he did non-HT tests too. >>means you need a personal Intel engineer to make it blazing fast for people who >>plopped down $600 USD for a top-of-the-line Intel chip. Before long they'll >>start selling Intel engineers in local computer shops. Collect all 18... > >Crafty is doing 2 probes in 2 hashtables for example. Remove it and >improve it to 4 probes at 1 table (which is faster on both intel and >AMD anyway, but AMD profits more because its chipset is cheaper). > >>HT is a good idea, and it works in practice rather than just on paper. It just >>doesn't work for -everything-. > >in the factory they press 2 cpu's and put a single P4 sticker on it. >You pay a factor 2 more, but get something 11.4% faster. For databases >it was measured 11% rather than 11.4%. > >That's what i call a bad buy! CPUs since the Pentium have been pipelined. The goal is to spread the work out so you can get a throughput of at least 1 op/cycle. Not always possible, particularly with complex instructions. Every CPU since then had adhered to superscalar designs. The Pentium 4 is no different. It has an extremely long pipeline to enable it to clock to higher frequencies. The bulk of this pipeline is shared for each "logical" CPU. They share caches, execution units, decoders, etc. The only thing that gets duplicated is the register set, a smaller part of the CPU. >>>First conclusion is that the system is profitting only from HT when you >>>use 4 processes at the same time, OTHERWISE IT IS A DISADVANTAGE IF >>>YOU MULTITHREAD, because see the big difference between 2 processes >>>running with HT turned on and off. >>> >>>In itself when you have a program with just 2 threads which you >>>run on a dual it gets slower. My assumption is that the hardware reports >>>4 cpu's and that the software doesn't care at what cpu to schedule >>>the processes/threads. the result of that is that there is a 33% chance >>>that things get scheduled at a cpu which is already running a thread/process. >>> >>>Resulting in a system where 1 cpu idles kind of shortly and 1 cpu is running >>>2 threads/processes. >>> >>>Actually the actual chance that the 2 processes are scheduled at >>>2 different processors (there is 4 processors for the OS >>>times 3 processors left for the second process is 12 different >>>schedulings) is: 8/12 = 2/3 = 66%. In short there is a disaster possibility >>>of 33%. >> >>Yes, when one thread is scheduled on one processor, there are 3 choices for the >>other thread, and one is disaster. 1/3 = 33%. > >>>Now the absolute speed from performance viewpoint. If the system idles >>>completely and then starts to run *exclusively* diep at 4 processors, then >>>the measured speedup as you can calculate is in the order of 11.4% for >>>SMT/HT. >>> >>>That's not so much actually. The loss by searching parallel is at most >>>parallel applications bigger than the win of 11.4%. In case of DIEP >>>i am on the lucky side and go for that 11.4% faster speed. >>> >>>Yet the sad confirmation is that the pessimistic expectation about the >>>absolute speed is completely confirmed. This system performs (assuming >>>lineair scaling) like a 1.98 Ghz dual K7. >> >>If memory is a big issue for Diep, it probably won't scale linearly as memory >>never does. > >It's a bigger issue for crafty than for DIEP. I hope you realize that >this diep version is from 25 august 2002, that beta version runs pretty ok >at cc-NUMA machines as well. > >Crafty doesn't though. > >>>there are motherboards now which do not require registered memory and >>>the K7 runs already quite a while at 2.0Ghz in fact. Now i don't care >>>for XP at all here nor do i care for the P4 at all. I just care for >>>parallel search here. >>> >>>If we know that a 2.0Ghz dual K7 is identical to a dual 2.8Ghz Xeon >>>and that in the majority of cases the K7 is going to win, then considering >>>the huge price difference, the choice would be trivial for most who >>>are looking for a lot of computing power for little money. >> >>AMD has always been better price/performance. Before the huge price differences >>in AMD and Intel chips, the AMD chips meant your old Socket 7 board could be >>used through ~500 MHz. > >>>Doesn't take away the fact that the P4 is winning ground. I remember >>>the first dual AMD 1.2ghz test versus P4 dual 1.7Ghz and the AMD dual >>>being 20% faster. Meaning in short that the speed of a P4 was performing >>>about 1 : 1.7 >>> >>>Now if i compare a dual Xeon 2.8Ghz with a 2Ghz K7 then it's equal >>>meaning the P4 is performing 1 : 1.4 >>> >>>So that's a big step forward! >> >>Well just about every application saw a similar gain from the 512 KB cache >>Northwood from the 256 KB cache Williamette. The new Xeons, as I understand, >>have 1 MB L3 cache in -addition- to the other caches. Don't quote me there. All >>I know is that things changed. The extra cache makes the P4 competitive whereas > >It's the DDR ram that speeded DIEP and crafty up a lot. Not the bigger >cache so much. > >DDR ram has nearly 2 times faster latency than RDRAM. You seem so sure, but you never tested a Northwood on RDRAM or a Williamette on DDR SDRAM to know. >>before P4 performance was something of an oxymoron, a joke among the people >>who'd seen its scores, and a disappointment for former Intel fans. > >>You'll probably observe the trend shift (not -completely-) toward the former >>when AMD releases Barton, likewise equipped with 512 KB of L2 cache. > >512KB is better than 256KB but i do not believe that the changing of just >the cache is going to improve the thing a lot. Getting it to 0.13 and >also clocking it at 3 Ghz will have more of an impact i bet. The size of the core doesn't affect performance directly. It affects how high the CPU gets clocked. The CPU can only do real work on the edges of a clock cycle. It doesn't matter how small it gets; if the CPU receives a 1 Hz clock, it's going to go 1 Hz, and that's pretty slow. In computationally-intensive applications, clock speed will yield linear increases in performance. However, you haven't posted results for Diep on a wide variety; you've only posted the four benchmarks. Little can be discerned except which system is faster. That yield no useful information about the architecture or how clock rate affects performance or how ram affects performance. There is no data. >>>Whether the step is because of DDR ram versus the very bad performing >>>RDRAM (nearly 2 times slower latency) is a matter of open discussion. >>> >>>HT/SMT in itself is not so impressing now. >>> >>>It's trivial to say that it will get impressive when the P4 can split itself >>>into 2 real processors having little dependencies on each other. >>> >>>Right now the single cpu win on a P4 3.06Ghz HT (18%) is >>>clearly more than the older generation 2.8 Ghz HT/SMT. so it seems >>>also this technique is slowly winning in realism. >>> >>>Right now i can't take what's getting on the market now very serious. >>> >>>Best regards, >>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.