Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SURPRISING RESULTS P4 Xeon dual 2.8Ghz

Author: Vincent Diepeveen

Date: 09:08:41 12/17/02

Go up one level in this thread


On December 17, 2002 at 11:50:20, Matt Taylor wrote:

>On December 17, 2002 at 11:25:10, Vincent Diepeveen wrote:
>
>>On December 17, 2002 at 10:58:51, Bob Durrett wrote:
>>
>>
>>Indeed you are correctly seeing that DIEP, which runs well on
>>cc-NUMA machines as well, is a very good program from intels
>>perspective, because even a 'second' processor on each physical
>>processor which runs slower will still give it a speedboost,
>>where others simply slow down a lot when you do such toying.
>>
>>So where many programs which will be way slower when running at
>>4 processes/threads at a 2 processor Xeon, the software is the
>>weak chain.
>>
>>In case of DIEP the bottleneck is the hardware clearly. Even
>>something working great on cc-NUMA doesn't profit too much from
>>the SMT/HT junk from intel.
>
>Clearly? It seems to me that memory is your bottleneck, and logical CPUs
>obviously don't help you there.

for the SMT/HT the memory isn't my bottleneck at all. the fact that
it's not 2 real processors but something that has to wait for the
other each time is the problem.

>>Though it is a great sales argument, the hard facts (11.4%
>>speedboost) are not lying.
>
>11.4% doesn't lie for chess, or at least for Diep. Intel didn't advertise, "Wow!
>HT will make your chess programs run faster!" Intel said HT will get an average
>of 30-40% speed gain across applications on -average-.

That is a typical marketing thing. they compare HT versus HT. So
2 processes HT versus 4 processes HT instead of  2 processes NON HT versus
4 processes HT.

If you look to diep's speeds you'll see that
  181538 4 processes HT is a lot faster than 2 processes HT: 135924 nps.

That's 33.6% speedup.

However it is not a fair compare. The fair compare shows a 11.6% speedup.

What was posted from crafty here was the unfair compare. No fair compare
was posted so far.

Who is testing objectively here?

>>So they need to press 2 cpu's which results in a cpu price
>>2 times higher *at least* than an AMD cpu, the result
>>is that you win 11.4% in speed.
>
>Intel has always charged astronomical prices for their latest CPUs. HT isn't
>driving the price up. Intel doesn't like losing profits.

>In 6 months, the Pentium 4 3.06 GHz will be in the $200-$300 range just like the
>Pentium 4 2.53 GHz is now. A year from now, it will cost $100-$200. Five years
>from now, it will be on keychains.

>>Though i am not a hardware engineer, i can imagine the problems
>>they had getting this to work.

>Yes, they had to build a mux and duplicate some components. The infrastructure
>has been there for the past 5 years.

>>Instead of a P4-Xeon cpu clocked at 2.8Ghz which can split itself
>>into 2 physical processors, i would have preferred a P3-Xeon cpu
>>which splitted itself into 2 real processors (so each having its
>>own L1 and L2 caches) clocked at 2.0Ghz.
>
>They had trouble clocking the Pentium 3 above 1 GHz. It's been run at
>frequencies from 150 MHz (the slowest Pentium Pro that I recall ever seeing, but
>perhaps not the slowest) all the way up to 1.4 GHz. A design only scales so far.
>Wouldn't it be nice if you could buy 3 GHz Athlons? Athlon just won't run at 3
>GHz. Pentium 4 does because it's designed to. Pentium 3 wasn't even designed to
>hit 1.4 GHz; it wouldn't go much further anyway.

Athlon only recently is converted to 0.13

the reason why the P4 clocks so high is because they use such a small
L1 cache and a small trace cache (though compared to the data cache it's
huge).

What i dislike a lot is the huge branch misprediction penalty. I'm not
a liar claiming that diep can get speeded up 2 times at the P4 when the
p4 would not have such a very bad branch misprediction penalty.

also 1 decoder for new instructions i do not understand at all.

Basically the P4 is a cpu where inefficient coding is getting rewarded.

If you code very bad and need a lot of extra variables and instructions
to get something done then the number of branches get kept relatively
lower than a very efficient program which is doing a few instructions
but can't prevent a branch there because other code needs execution.

Replacing branches by extra instructions is simply not possible anymore,
because already when the pentiumpro came out, i already started slowly
avoiding branches whenever i could. I had that thing around end of 1996
if memory serves me well.

>>That would have kicked anything of course from speed viewpoint as
>>it scales 1 : 1.2 to a K7 (k7 20% faster for each Ghz than the P3).
>>
>>Now we end up with a very expensive cpu which is 1 : 1.4 and a bad
>>working form of HT/SMT.
>>
>>So it's not DIEP having a problem here. But the hardware very clearly.
>>Intel optimistically claims 20% speed boost here and there. Others
>>claim 11% for database applications.
>>
>>I see 11.4% for DIEP. So that's a market conform viewpoint.
>>
>>The not so amazing thing of this all is that a 2.8Ghz Xeon being not
>>deliverable yet here is very expensive (even a 3.06Ghz P4 is already 885
>>euro in the shops here also not yet deliverable) and the MP2200 which
>>DOES get offered for sales here is 290 euro. the fastest Xeon i see
>>getting offered socket 603 is a 2.0Ghz Xeon for 829 euro at alternate.nl
>>
>>a dual motherboard for the P4 i see here is several:
>>  789 euro for a dual xeon motherboard called: 860d pro (msi)
>>  549 euro for a tyan S2720GN is by far the cheapest i see
>>
>>then you gotta buy ecc registered DDR ram for it.
>>
>>a dual motherboard for K7 i see at the same alternate.nl is:
>>  259 euro for A7M266-D/U
>>  299 euro chaintech 7KDD (dual; U-DMA/133 RAID en sound)    AMD-762MPX
>>  289 euro tiger MPX S2466N-4M
>>
>>The last mainboard (tiger) for sure needs registered DDR ram. but lucky
>>not ECC ram.
>
>AMD is always cheaper than Intel for the same level of performance.

if you look how huge that P4 chip is compared to the AMD chip it is not
a miracle either.

knowing AMD has just 1 0.13 factory versus intel a lot it is not a miracle
either that in the future this will remain the same.

>Also, I own a TigerMPX S2466N-2M (only difference being that they don't mind
>telling me to eat a PCI slot for USB). At one point I only had 1 256 MB
>unregistered/non-ECC DIMM because my other 512 MB unregistered/non-ECC DIMM had
>failed. I finally replaced both with a single 1 GB Registered/ECC DIMM.
>
>If anyone wants to send me a digital camera, I'll take pretty pictures of the
>BIOS screens, my unregistered DIMM, and a working TigerMPX system on
>unregistered ram.

not all unregistered DIMMS do not work for a system requiring registered
dimms. I can give you the names of 3 persons with problems with a Tiger
(not sure they had MPX chipset though but the older tiger MP760 chipset
i guess) who after a few days had severe stability problems with it and
weird crashes each week or so.

>If I'm feeling generous, I'll also take pictures of my dual-AthlonMP 2000 system
>at work.

>>the P4 dual motherboards need for sure ecc registered stuff.
>>
>>The only good news is that ddr ram ecc registered is a lightyear cheaper
>>than ecc registered RDRAM.
>>
>>RDRAM RIMM 256 MB (ValueRAM, ECC)    voor PC   PC1066   EUR 239,00
>>now you can't need 256MB at all. You need more RAM than that. which is
>>exponential more expensive i fear.
>>
>>You get better served with DDR ram though:
>>  kingston 1GB DIMM 1 GB (Registered) for PC   PC266   EUR 599,00
>>
>>It is amazing how many professors and others still throw away money
>>to get that dual 2.8Ghz P4 which is over 2 times more expensive than
>>AMD dual at the moment is.
>
>Money grows on trees for some people. It is amazing how my coworkers convinced
>management to purchase machines with Radeon 9700 Pro graphics cards for "work."
>These cards were 20% of the cost of the whole machine at around $350 USD per
>card.

right ;)

>Still, it is against social ettiquite to tell people how to spend their money.
>If someone wants to throw away money, they're fully entitled to do so.
>-Matt

Obviously, but i want to get away the fairy tale that more expensive machines
are always better.

Of course there is a supercomputer league where price doesn't matter.

where prices get measured in millions rather than thousands.

In that category we don't talk about 11.4% speedups of course.

But we talk about a 500 processor DIEP then at 500 real processors :)

Yet we must be realistic and see that there's just 1 such a great supercomputer
in whole netherlands with 1024 processors (www.sara.nl and click on the
'teras'; owned by NWO: www.nwo.nl).

then i realize again why i put in months of effort to rewrite diep to
cc-NUMA (still busy improving it!) and why i won't spend time to manuals
describing what SMT/HT is actually doing in hardware and what instructions
can get parallellized and which instructions/actions cannot.

Best regards,
Vincent





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.