Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: diep tested at dual core AMD/INTEL

Author: Vincent Diepeveen

Date: 17:02:10 04/28/05

Go up one level in this thread


On April 28, 2005 at 13:51:02, Robert Hyatt wrote:

>On April 28, 2005 at 07:52:27, Vincent Diepeveen wrote:
>
>> http://www.sudhian.com/showdocs.cfm?aid=667&pid=2543
>>
>>In world champs 2004 at quad opteron 2.0Ghz diep got about 380k nps in
>>openingsposition.
>>
>>529k nps at dual core dual opteron. It's just incredible. Scaling 3.92
>>Quad opteron world champs 2004 scaled 3.93
>>
>>This was a net2003 executable compiled in 32 bits. Intel c++ compiles were
>>slower.
>
>
>I think their data or test is a bit broken.  I have dual PIV's here, both my

Ah the usual P4 confusion deliberately done by intel.

Your old P4-Xeon is still the good xeons with a 2 cycle L1 cache at a 100 watt
power budget.

The prescott/nocona and also these sucking dual core P4s are having a 4 cycle L1
cache. 2 times slower.

This is called a P4EE dual core. In fact the real old P4EE which runs single
core only to 3.47Ghz is having a 2 cycle L1, but these new "P4EE" dual core have
a 4 cycle L1.

Additional they can do all just 1 read simultaneously and there is another X
disadvantages to the new cores, just to keep within power budget.

Such single core cpu's are already roughly against or just over 100 watt and the
dual core P4 can't be produced at 200 watt budget of course but also has to keep
within the more or less 100 watts, meanwhile they want to produce them cheap
(high yields).

So that's why these new P4 prescotts and dual core P4's suck so much, to say it
very very polite.

We didn't discuss the real major L1/L2 <==> memory bandwidth problem that the
dual core P4 gets as a result of SMT and lack of power budget.

From performance viewpoint intel is ancient history. the old P4 cores are way
faster than the new ones for majority of software.

Because 1 core is so much slower and the available bandwidth of just 1 cpu must
get shared by 4 logical processors now, that means massive problems.

With just a little bit of hardware papermath you can prove that to yourself very
easily.

Even then 188k nps is not too bad for 4 dead slow logical processors all sharing
1 poor memory controller. We already knew that the new prescott 3.2Ghz was just
above 90k nps, so getting double the speed now with a dual core P4 is about what
we could expect.

Only when it gets compared to a dual core dual opteron the real difference gets
made clear.

We should instead ask ourselves questions more like, what the speedup out of the
total nps is that 8 logical cores give at a dual xeon dual P4.

Well we get answer to that in 2006 i guess. Don't wait for it.

>xeon, and a dual 3.6 in the lab.  My scaling from 1 to 2 threads (no
>hyper-threading turned on) is way better than 1.67X using 2 threads.
>
>For example, on my dual xeon, a quick test produced 1.51M nps using a single
>processor, 2.89M using both.  Which is a 1.91X increase in NPS.
>
>On a 4-way 850, I have large tests that produce 2.32M for one cpu, 8.69M for
>four cpus, which is 3.75X faster.  Comparing apples to apples, the dual 850 NPS
>was 4.53M, which turns into 1.95X faster.  So for me, the dual xeon and the dual

You compare 64 bits opteron with 32 bits intel. That's not a very fair compare
IMHO, even though i'm sure you are not interested in knowing intels 64 bits
speed. It will scale bad of course.

>opteron produce almost identical NPS scaling, although the opteron is about 2x
>faster per cpu...

Please realize clearly the posted results are 32 bits and all those compilers
are P4 compilers, they do not take advantage of things that are fast at opteron.
Take for example all kind of conditional instructions. 2 cycles at opteron, 7+
at P4.

In 64 bits the opteron gets more of a speedup than intel with its tiny slow
caches does of course and conditional optimizations, not yet done by compilers
as it is slow for P4, would speedup opteron even more for our branchy
chessprograms.

Intels real problem for computerchess is it's small, slow, 1 port L1 cache.
Even though the huge branch misprediction penalty isn't very helpful either, and
replacing it by conditional move type instructions isn't very easy either as in
general that's dead slow at it too, even that is still not the real big problem
of it.

Knowing intels plans to modify the pentium-m a bit in order to release it for
desktop market are not very impressive either. It also has a 2 times smaller L1
cache than opteron. So they can put all kind of jingles and christmas bells at
it and get a bigger power budget, but it's not very promising for computerchess
either. Especially if we realize it won't be ready to be presented before end of
2006. That said, we realize that means it gets sold in 2007 somewhere.

Intel has simply lost the performance battle until start of 2007 for
desktopprocessors and objective comparisions like these show it real clear.

Even if intel comes by end 2006 with a good replacement for the P4, by then
every normal computer user will realize that intel sucks and AMD is fast, so the
damage intel has done to itself in the desktop market is one of long term.

The performance difference is just too huge and will only get bigger thanks to
all kind of future problems intel will have the coming 2 years.

To give a few examples:
  - 64 bits will help AMD more
  - when compilers improve and optimize better for AMD than they do now there
will be even more speedwin
  - when trying to clock higher, amd still can clock higher, intel already is
far over any decent wattage a processor can have

In short, only when intel would manage to create some sort of cell type
processor that is also fast for integer work loads (ibm/sony's cell processor
doesn't have branch prediction for example which is kind of harakiri for
chessprogs), only in that case intel makes a chance to come back in 2007. In all
other cases they better start firing 60% of their personnel. Starting with the
philisopher that caused intel to be the ONLY manufacturer with tiny L1 caches.
However if intel would have such a project currently that develops itself in a
positive way, they would already brag about it in all newspapers.

Look at ibm, sun, amd, transmeta etc. They all have WAY bigger L1 caches than
intels itanium and p4 have. pentium-m with 64KB L1 being in the middle of the
show.

Oh for those interested, the original expectation was that the dual core xeon
MP's/ dual core opteron 8xx's will be in the shops by Q1 2006.

I'm not going to wait for that; i've already thrown away all "intel inside"
stickers here.










This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.