Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SURPRISING RESULTS P4 Xeon dual 2.8Ghz

Author: Vincent Diepeveen

Date: 08:33:36 12/17/02

Go up one level in this thread


On December 17, 2002 at 11:27:18, Matt Taylor wrote:

>On December 17, 2002 at 10:10:46, Vincent Diepeveen wrote:
>
>>Hello,
>>
>>Some tests were performed in the USA, where some P4 Xeon dual 2.8Ghz
>>systems get delivered now. In Europe we can't get them yet and
>>most likely we don't want them either:
>>
>>Here are the results of DIEP at the Xeon 2.8Ghz dual ECC registered DDR ram.
>>
>>test 1: diep 4 processes. Of course HT enabled.
>>   181538 nps
>>
>>test 2: diep 2 processes. HT enabled.
>>   135924 nps
>>
>>test 3: diep 2 processes K7 1.6ghz (registered DDR ram all other settings
>>        identical to xeon dual setup):
>>   146555
>>
>>THE 2 TESTS NALIMOV DIDN'T OR COULDN'T WANT TO DO WITH CRAFTY
>>SOME WEEKS AGO REVEAL A BIG WEAKNESS OF HT/SMT:
>>
>>test 4: diep 2 processes. HT disabled.    171288 nps
>>
>>test 5 and 6: diep single cpu HT disabled and enabled were same speed
>>   92090  nps versus 92019 nps.
>
>Crafty gets better results with HT, but it's been optimized for HT. It just

That hasn't been proven yet.

there was no test done without HT and 2 processors as far as i know.

Please read how i tested it.

>means you need a personal Intel engineer to make it blazing fast for people who
>plopped down $600 USD for a top-of-the-line Intel chip. Before long they'll
>start selling Intel engineers in local computer shops. Collect all 18...

Crafty is doing 2 probes in 2 hashtables for example. Remove it and
improve it to 4 probes at 1 table (which is faster on both intel and
AMD anyway, but AMD profits more because its chipset is cheaper).

>HT is a good idea, and it works in practice rather than just on paper. It just
>doesn't work for -everything-.

in the factory they press 2 cpu's and put a single P4 sticker on it.
You pay a factor 2 more, but get something 11.4% faster. For databases
it was measured 11% rather than 11.4%.

That's what i call a bad buy!

>>First conclusion is that the system is profitting only from HT when you
>>use 4 processes at the same time, OTHERWISE IT IS A DISADVANTAGE IF
>>YOU MULTITHREAD, because see the big difference between 2 processes
>>running with HT turned on and off.
>>
>>In itself when you have a program with just 2 threads which you
>>run on a dual it gets slower. My assumption is that the hardware reports
>>4 cpu's and that the software doesn't care at what cpu to schedule
>>the processes/threads. the result of that is that there is a 33% chance
>>that things get scheduled at a cpu which is already running a thread/process.
>>
>>Resulting in a system where 1 cpu idles kind of shortly and 1 cpu is running
>>2 threads/processes.
>>
>>Actually the actual chance that the 2 processes are scheduled at
>>2 different processors (there is 4 processors for the OS
>>times 3 processors left for the second process is 12 different
>>schedulings) is: 8/12 = 2/3 = 66%. In short there is a disaster possibility
>>of 33%.
>
>Yes, when one thread is scheduled on one processor, there are 3 choices for the
>other thread, and one is disaster. 1/3 = 33%.

>>Now the absolute speed from performance viewpoint. If the system idles
>>completely and then starts to run *exclusively* diep at 4 processors, then
>>the measured speedup as you can calculate is in the order of 11.4% for
>>SMT/HT.
>>
>>That's not so much actually. The loss by searching parallel is at most
>>parallel applications bigger than the win of 11.4%. In case of DIEP
>>i am on the lucky side and go for that 11.4% faster speed.
>>
>>Yet the sad confirmation is that the pessimistic expectation about the
>>absolute speed is completely confirmed. This system performs (assuming
>>lineair scaling) like a 1.98 Ghz dual K7.
>
>If memory is a big issue for Diep, it probably won't scale linearly as memory
>never does.

It's a bigger issue for crafty than for DIEP. I hope you realize that
this diep version is from 25 august 2002, that beta version runs pretty ok
at cc-NUMA machines as well.

Crafty doesn't though.

>>there are motherboards now which do not require registered memory and
>>the K7 runs already quite a while at 2.0Ghz in fact. Now i don't care
>>for XP at all here nor do i care for the P4 at all. I just care for
>>parallel search here.
>>
>>If we know that a 2.0Ghz dual K7 is identical to a dual 2.8Ghz Xeon
>>and that in the majority of cases the K7 is going to win, then considering
>>the huge price difference, the choice would be trivial for most who
>>are looking for a lot of computing power for little money.
>
>AMD has always been better price/performance. Before the huge price differences
>in AMD and Intel chips, the AMD chips meant your old Socket 7 board could be
>used through ~500 MHz.

>>Doesn't take away the fact that the P4 is winning ground. I remember
>>the first dual AMD 1.2ghz test versus P4 dual 1.7Ghz and the AMD dual
>>being 20% faster. Meaning in short that the speed of a P4 was performing
>>about 1 : 1.7
>>
>>Now if i compare a dual Xeon 2.8Ghz with a 2Ghz K7 then it's equal
>>meaning the P4 is performing 1 : 1.4
>>
>>So that's a big step forward!
>
>Well just about every application saw a similar gain from the 512 KB cache
>Northwood from the 256 KB cache Williamette. The new Xeons, as I understand,
>have 1 MB L3 cache in -addition- to the other caches. Don't quote me there. All
>I know is that things changed. The extra cache makes the P4 competitive whereas

It's the DDR ram that speeded DIEP and crafty up a lot. Not the bigger
cache so much.

DDR ram has nearly 2 times faster latency than RDRAM.

>before P4 performance was something of an oxymoron, a joke among the people
>who'd seen its scores, and a disappointment for former Intel fans.

>You'll probably observe the trend shift (not -completely-) toward the former
>when AMD releases Barton, likewise equipped with 512 KB of L2 cache.

512KB is better than 256KB but i do not believe that the changing of just
the cache is going to improve the thing a lot. Getting it to 0.13 and
also clocking it at 3 Ghz will have more of an impact i bet.

>>Whether the step is because of DDR ram versus the very bad performing
>>RDRAM (nearly 2 times slower latency) is a matter of open discussion.
>>
>>HT/SMT in itself is not so impressing now.
>>
>>It's trivial to say that it will get impressive when the P4 can split itself
>>into 2 real processors having little dependencies on each other.
>>
>>Right now the single cpu win on a P4 3.06Ghz HT (18%) is
>>clearly more than the older generation 2.8 Ghz HT/SMT. so it seems
>>also this technique is slowly winning in realism.
>>
>>Right now i can't take what's getting on the market now very serious.
>>
>>Best regards,
>>Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.