Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty profits little from Itanium and Opteron versus Commercials

Author: Sune Fischer

Date: 05:08:57 08/07/03

Go up one level in this thread


On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote:

>>65% is disappointing for Crafty. It should get the 65% from register and cache
>>like all other programs, plus whatever speedup the bitboards get.
>
>Wrong.
>
>Non bitboarders like diep have also a lot of potential for more complex cpu's.

Of course.

> a) better usage of more registers (crafty is doing things very simple see code)
> b) better usage of PGO (profile guided optimizations)

Yes I think this might actually help non-bitboarders more, I got nothing from
PGO last I tested, I don't really have that many branches either, mostly tables.

> c) better icache usage (crafty fits within icache, nothing to optimize)

Are you sure about that?
The rotated tables alone are 512 kB afaik.

> d) better bpt (branch prediction table) usage. crafty fits in simply
>    smaller bpt's.
>
>My guess is sjeng is so fast because of the more registers and the bigger bpt in
>combination with faster (75%) random latency. measured with dieter's dblat
>program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7
>machine. dual xeon also is giving around 400.
>
>Do not underestimate point d.
>
>Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM
>latency does the vaste rest.
>
>Please compare speedup of crafty from K7 to itanium2 cpu:
>  itanium2 madison 1.3Ghz == 907
>  MP2400 2.0Ghz           == 1156
>  XP2100 1.73Ghz          == 1022
>  XP1700 1.46Ghz          == 867
>  XP1800 1.6Ghz           == 903
>
>So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz
>
>Now diep way more complex program can profit more from complex cpu and
>the PGO and loops:
>  K7 2Ghz == itanium2 1.3Ghz
>
>Crafty profits less from new generation cpu's than complex commercial programs
>in short.
>
>Point made clear?

You seem to have missed one crucial point.
Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that
doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table
for that. Little things like that are all over the program, when I remove this
and go pure 64 bit I do think a factor 2 clock for clock is reachable.

>See for crafty specint:

After I saw they tested with 32 bit binaries, I'm not prepared to give them much
credit.
Frankly I want Eugene or Hyatt to produce the binary, needs to be done right or
you lose 30% real quick. The pure C version is a lowest common denominator
compile, it sucks basicly.

I also want to see other bitboard progs, I'm not sure Crafty is representative
for all, my program is very different, for better or worse of course.

-S.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.