Computer Chess Club Archives


Search

Terms

Messages

Subject: AMD64 for chess

Author: Vincent Diepeveen

Date: 09:14:15 09/23/03


Hello,

Many of you are looking forward to the new cheap 64 bits area with AMD64 being
the first 64 bits processor to get released. The economy is not booming bigtime.
We can't complain too loud about economy in the western world, but manufacturers
feel a big decline in sales when economy booms a little less than it used to do.

Therefore there is a lot of interesting news on the hardware front. Even in 100
pages i could not even describe everything that's interesting to me and new, so
i'll just focus upon what is interesting for computerchess.

Obvious is the movement in the highend. A few years ago there were big expensive
supercomputer processors which outgunned any cheap PC processor bigtime.

That has changed for what we call 'integer' software. Still the 'highend'
processors do well for floating point, especially some vector processors, but
for 'integers' which are non-broken numbers; 1 5 -4 etc it's all integer.

But 0.01 or 5.05 -0.05 -0.0000000134 that's all called floating point.

There are many testsets which measure processor strengths. That's all not so
interesting. Not very interesting either is what we call SSE2/SSE. SSE is 128
bits stuff.

Like you can read at www.chip-architect.com a big technical analysis of the
opteron/AMD64 you can see that those special instructions with a lot of bits,
are very slow.

A 2Ghz clocked opteron is delivering 2 million 'clocks' or 'cycles' a second.

Trivially the more basic instructions we can execute, the better.

We see a lot of discussions at the CCC regarding using SSE2/SSE for chess.

However i must always laugh loud when i see that. Let's quote Hans de Vries
regarding the new intel prescott:

"This would bring back the SSE2 latencies for Add and Multiply to 5 and 7
cycles"

Note that at the current P4 it is 25% slower than that.

A good chessprogram can however execute up to 3 'integer' instructions a cycle.

So that's like 10 times faster on average than using SSE2, even despite that you
can use it in theory 'simultaneously'.

So for chess the only interesting thign is the integer speed. Integers can be 8
bits, 16 bits, 32 bits and nowadays at opteron also 64 bits.

A lot of 'commercial' chess programs as well as what officially is called
'strong amateur programs', doesn't necessary say a word on which of the 2 is
stronger, mix 8 bits code with 32 bits code a lot.

I found out at the opteron (but i could have been misguided of course) that this
is NOT a good idea to do.

At K7 8 bits code is very fast, at P4 it's already not so interesting to use,
but at opteron it's seemingly a lot slower.

DIEP doesn't have that problem. DIEP is 32 bits *all the way*.

With exception of my nodescount, you won't find much 64 bits code YET in DIEP.

Therefore it is ideal to run on hardware like the opteron/amd64.

The speed of the AMD64 is very convincing. I need to add one big note and that's
that the latest P4s do a lot better than older P4s. I do not know yet what they
modified at the cores, but it's doing a lot better than it used to do.

At aceshardware.com you can see the results. Note that for the P4 SMP version
was used, not a NUMA version. Also different versions were used to get faster on
the P4 and dual P4 Xeon.

I managed to improve diep a lot to run faster on the dual P4 Xeon 3.06Ghz and it
manages with 4 'threads' a speed of 227k nps.

If we consider that a single cpu AMD64 2.4Ghz already gets 149k nps with the
same executable, then i don't need to comment much more.

If we compare that with 'old' P4 3.06Ghz which gets 89k nps with this version
and the athlon 2.127Ghz MP2600 which gets single cpu 95k nps, then it is
needless to say that the AMD64 is a big winner.

It's 50% faster than my K7, which is the highest clocked MP version (MP2800
isn't clocked higher).

For more details just look at aceshardware.com, my own impression of what was
improved at the AMD64 is especially the branch prediction. As if it hardly
suffers from branchmispredictions. That's really amazing.

Real new it isn't, but they got it to work great at the AMD64. This in
combination with a larger branch prediction table and all kind of other
advantages is real great.

Next posting: GCC at the quad opteron



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.