Author: Vincent Diepeveen
Date: 18:52:46 12/31/02
Go up one level in this thread
On December 31, 2002 at 17:22:42, Rick Terry wrote:
>A friend of mine who works at USA Computers seems convinced that The Pentium 4
>512 FSB outperforms the Athlon in Every Bench Mark. I didn't know enough to
>argue with him, repeating only what I heard here about AMD Processors being
>Superior to the Pentium in Running Chess Programs, But Why?, Why would the
>Pentium Perform better then the Athlon in all other Benchmarks but Chess?
A number of reasons but the most important comes down to next:
Chessprograms are made by very good programmers and they have optimized them
so well that even the most complex commercial chessprograms basically are
depending upon processor speed whereas those applications of your friend
have to do more with Level2 cache.
The good chessprogrammers managed to optimize them so far that they
are less relying upon L2 cache speed.
It is of course very bad that some applications depend upon L2 cache speed.
They should have used better programmers for it!
The P4 is having very excellent L2 cache whereas the processor in itself
is a very bad piece of work compared to the much older K7 processor.
If you look simply what a very complex program can execute a clock cycle
on a P4 versus a K7 then the K7 is having a huge number of resources
and it is more than amazing that a newer generation processor (the P4)
is not even being capable of outperforming it at all when talking
about CPU itself.
That SMT/HT when it gets made for the AMD processor will of course let the
K7 processor profit *way* more than the P4 profits from it (after they
add some registers of course).
Intel has however a lot more experience with multiprocessing than AMD,
so we will see how this develops in the future.
Short look at opteron
- huge L1 cache (sorry that i do not know from head how big exactly,
but k7 is already like 128KB L1 cache versus the P4 has 8KB)
- 16384 BTB/BPT entries versus the P3 had just 512 (K7=2048). I didn't
even lookup how many P4 has because it obviously is very needed for
P4 to have a lot more, but i bet it doesn't have 16384 entries.
- locking the processor seems a lot easier (for my own software)
and costs if i continues lock at 2 processors at the same cache line
like 0.3 seconds at 5 minutes. I get impression that current DIEP
versions perform a lot better at the P4 after i deliberately did
a lot of effort to lock less in DIEP. In fact it locks in such a way
that other processors can run on without *ever* getting hurted. It
can split and search and unlock without other processors seeing it
even.
Yet the average parallel software is not so well written. Then K7
wins bigtime if it is about cpu speed and locking that cpu.
On the other hand because of its good level caches (no complaints here
about P4) the effective bandwidth should be higher on P4. Yet P4 is
an entire new design and K7 is already pretty old by now. So if we
take a look to the new design of AMD where the memory controller is on
the cpu then that will outgun of course the P4 by a large margin.
The positive news is that lately the P4s are improving a lot, yet the first
few benchmarks of the new AMD processors are so very impressive that intel
needs an entire new CPU to compete with that with regard to non chess
programs. If you consider then also that the current K7 is clocked
to 2.x Ghz already and that the opteron is 12 stages or something,
then i am sure the new line of AMD processors can easily get
clocked to 3.0Ghz too.
In general Intel is a lot faster in releasing its processors than AMD.
The performance for each clockcycle is a lot higher than the P4 ever
will get, so intel has a big job to release a new processor which combines
the positive things of the P4 with a big processor speed.
The sad thing here is that most benchmarks care shit for the actual
processor execution speed for complex software but are basically needing
bigger bandwidth and lower latencies for memory accesses.
That is real sad for computerchess, because you measure more
how well the caches are than.
AMD definitely is not a hair better here than intel. If we look to the new
opteron then we see it still can execute only 3 instructions a clock.
A small look back in history learns that the pentiumpro already could do that
and that the P4 and new AMD processors still will do 3 instructions a clock.
This is real sad.
Computerchess will basically profit most from if they go from 3 to like
6 instructions a clock.
In that respect the new supercomputer CPU's which will get released in
the coming few years by different manufacturers (most notably intel)
will kick major butt in this respect.
Where a single supercomputer CPU now is clocked at around 1 to 1.3Ghz
versus K7 at 2.x and P4 already at 2.8-3.06ghz,
soon that huge difference will get smaller and smaller and the delay
to release them will be a lot closer to the x86 release.
Then again the supercomputer cpu's will blow away single cpu any x86
cpu by a large margin, simply because they have a huge L2 or L3 cache
and can execute a huge amount of instructions a clock effectively.
Way more than any x86 processor can and i fear will do in the near future.
In that respect the development of x86 cpu's is not very positive for
computer chess.
Best regards,
Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.