Author: Vincent Diepeveen
Date: 09:08:41 12/17/02
Go up one level in this thread
On December 17, 2002 at 11:50:20, Matt Taylor wrote: >On December 17, 2002 at 11:25:10, Vincent Diepeveen wrote: > >>On December 17, 2002 at 10:58:51, Bob Durrett wrote: >> >> >>Indeed you are correctly seeing that DIEP, which runs well on >>cc-NUMA machines as well, is a very good program from intels >>perspective, because even a 'second' processor on each physical >>processor which runs slower will still give it a speedboost, >>where others simply slow down a lot when you do such toying. >> >>So where many programs which will be way slower when running at >>4 processes/threads at a 2 processor Xeon, the software is the >>weak chain. >> >>In case of DIEP the bottleneck is the hardware clearly. Even >>something working great on cc-NUMA doesn't profit too much from >>the SMT/HT junk from intel. > >Clearly? It seems to me that memory is your bottleneck, and logical CPUs >obviously don't help you there. for the SMT/HT the memory isn't my bottleneck at all. the fact that it's not 2 real processors but something that has to wait for the other each time is the problem. >>Though it is a great sales argument, the hard facts (11.4% >>speedboost) are not lying. > >11.4% doesn't lie for chess, or at least for Diep. Intel didn't advertise, "Wow! >HT will make your chess programs run faster!" Intel said HT will get an average >of 30-40% speed gain across applications on -average-. That is a typical marketing thing. they compare HT versus HT. So 2 processes HT versus 4 processes HT instead of 2 processes NON HT versus 4 processes HT. If you look to diep's speeds you'll see that 181538 4 processes HT is a lot faster than 2 processes HT: 135924 nps. That's 33.6% speedup. However it is not a fair compare. The fair compare shows a 11.6% speedup. What was posted from crafty here was the unfair compare. No fair compare was posted so far. Who is testing objectively here? >>So they need to press 2 cpu's which results in a cpu price >>2 times higher *at least* than an AMD cpu, the result >>is that you win 11.4% in speed. > >Intel has always charged astronomical prices for their latest CPUs. HT isn't >driving the price up. Intel doesn't like losing profits. >In 6 months, the Pentium 4 3.06 GHz will be in the $200-$300 range just like the >Pentium 4 2.53 GHz is now. A year from now, it will cost $100-$200. Five years >from now, it will be on keychains. >>Though i am not a hardware engineer, i can imagine the problems >>they had getting this to work. >Yes, they had to build a mux and duplicate some components. The infrastructure >has been there for the past 5 years. >>Instead of a P4-Xeon cpu clocked at 2.8Ghz which can split itself >>into 2 physical processors, i would have preferred a P3-Xeon cpu >>which splitted itself into 2 real processors (so each having its >>own L1 and L2 caches) clocked at 2.0Ghz. > >They had trouble clocking the Pentium 3 above 1 GHz. It's been run at >frequencies from 150 MHz (the slowest Pentium Pro that I recall ever seeing, but >perhaps not the slowest) all the way up to 1.4 GHz. A design only scales so far. >Wouldn't it be nice if you could buy 3 GHz Athlons? Athlon just won't run at 3 >GHz. Pentium 4 does because it's designed to. Pentium 3 wasn't even designed to >hit 1.4 GHz; it wouldn't go much further anyway. Athlon only recently is converted to 0.13 the reason why the P4 clocks so high is because they use such a small L1 cache and a small trace cache (though compared to the data cache it's huge). What i dislike a lot is the huge branch misprediction penalty. I'm not a liar claiming that diep can get speeded up 2 times at the P4 when the p4 would not have such a very bad branch misprediction penalty. also 1 decoder for new instructions i do not understand at all. Basically the P4 is a cpu where inefficient coding is getting rewarded. If you code very bad and need a lot of extra variables and instructions to get something done then the number of branches get kept relatively lower than a very efficient program which is doing a few instructions but can't prevent a branch there because other code needs execution. Replacing branches by extra instructions is simply not possible anymore, because already when the pentiumpro came out, i already started slowly avoiding branches whenever i could. I had that thing around end of 1996 if memory serves me well. >>That would have kicked anything of course from speed viewpoint as >>it scales 1 : 1.2 to a K7 (k7 20% faster for each Ghz than the P3). >> >>Now we end up with a very expensive cpu which is 1 : 1.4 and a bad >>working form of HT/SMT. >> >>So it's not DIEP having a problem here. But the hardware very clearly. >>Intel optimistically claims 20% speed boost here and there. Others >>claim 11% for database applications. >> >>I see 11.4% for DIEP. So that's a market conform viewpoint. >> >>The not so amazing thing of this all is that a 2.8Ghz Xeon being not >>deliverable yet here is very expensive (even a 3.06Ghz P4 is already 885 >>euro in the shops here also not yet deliverable) and the MP2200 which >>DOES get offered for sales here is 290 euro. the fastest Xeon i see >>getting offered socket 603 is a 2.0Ghz Xeon for 829 euro at alternate.nl >> >>a dual motherboard for the P4 i see here is several: >> 789 euro for a dual xeon motherboard called: 860d pro (msi) >> 549 euro for a tyan S2720GN is by far the cheapest i see >> >>then you gotta buy ecc registered DDR ram for it. >> >>a dual motherboard for K7 i see at the same alternate.nl is: >> 259 euro for A7M266-D/U >> 299 euro chaintech 7KDD (dual; U-DMA/133 RAID en sound) AMD-762MPX >> 289 euro tiger MPX S2466N-4M >> >>The last mainboard (tiger) for sure needs registered DDR ram. but lucky >>not ECC ram. > >AMD is always cheaper than Intel for the same level of performance. if you look how huge that P4 chip is compared to the AMD chip it is not a miracle either. knowing AMD has just 1 0.13 factory versus intel a lot it is not a miracle either that in the future this will remain the same. >Also, I own a TigerMPX S2466N-2M (only difference being that they don't mind >telling me to eat a PCI slot for USB). At one point I only had 1 256 MB >unregistered/non-ECC DIMM because my other 512 MB unregistered/non-ECC DIMM had >failed. I finally replaced both with a single 1 GB Registered/ECC DIMM. > >If anyone wants to send me a digital camera, I'll take pretty pictures of the >BIOS screens, my unregistered DIMM, and a working TigerMPX system on >unregistered ram. not all unregistered DIMMS do not work for a system requiring registered dimms. I can give you the names of 3 persons with problems with a Tiger (not sure they had MPX chipset though but the older tiger MP760 chipset i guess) who after a few days had severe stability problems with it and weird crashes each week or so. >If I'm feeling generous, I'll also take pictures of my dual-AthlonMP 2000 system >at work. >>the P4 dual motherboards need for sure ecc registered stuff. >> >>The only good news is that ddr ram ecc registered is a lightyear cheaper >>than ecc registered RDRAM. >> >>RDRAM RIMM 256 MB (ValueRAM, ECC) voor PC PC1066 EUR 239,00 >>now you can't need 256MB at all. You need more RAM than that. which is >>exponential more expensive i fear. >> >>You get better served with DDR ram though: >> kingston 1GB DIMM 1 GB (Registered) for PC PC266 EUR 599,00 >> >>It is amazing how many professors and others still throw away money >>to get that dual 2.8Ghz P4 which is over 2 times more expensive than >>AMD dual at the moment is. > >Money grows on trees for some people. It is amazing how my coworkers convinced >management to purchase machines with Radeon 9700 Pro graphics cards for "work." >These cards were 20% of the cost of the whole machine at around $350 USD per >card. right ;) >Still, it is against social ettiquite to tell people how to spend their money. >If someone wants to throw away money, they're fully entitled to do so. >-Matt Obviously, but i want to get away the fairy tale that more expensive machines are always better. Of course there is a supercomputer league where price doesn't matter. where prices get measured in millions rather than thousands. In that category we don't talk about 11.4% speedups of course. But we talk about a 500 processor DIEP then at 500 real processors :) Yet we must be realistic and see that there's just 1 such a great supercomputer in whole netherlands with 1024 processors (www.sara.nl and click on the 'teras'; owned by NWO: www.nwo.nl). then i realize again why i put in months of effort to rewrite diep to cc-NUMA (still busy improving it!) and why i won't spend time to manuals describing what SMT/HT is actually doing in hardware and what instructions can get parallellized and which instructions/actions cannot. Best regards, Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.