Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty profits little from Itanium and Opteron versus Commercials

Author: Robert Hyatt

Date: 08:15:54 08/07/03

Go up one level in this thread


On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote:

>On August 07, 2003 at 05:55:13, Sune Fischer wrote:
>
>>On August 07, 2003 at 05:30:57, Gian-Carlo Pascutto wrote:
>>
>>>>The Opteron was tested in SPEC. For their entire set of programs (also
>>>>non-chess and hence non-bitboarders), the Opteron is clock for clock 55%
>>>>faster than the Athlon XP.
>>>
>>>...and it's 74% faster clock for clock than the P4.
>>>
>>>For Crafty, the Opteron is 65% faster clock for clock than the Athlon XP.
>>>It's 165% (!!) faster clock for clock than the Pentium 4.
>>>(All from SPEC data)
>>>
>>>                Opteron v Athlon       Opteron vs P4
>>>SPEC                  1.55                   1.74
>>>Crafty                1.65                   2.65
>>>Sjeng                 1.70                   2.05
>>>
>>>Crafty gets 7% more speedup on the Opteron compared to the average program,
>>>and Sjeng gets 10% more speedup compared to the average program.
>>>
>>>Based on this, it looks to me like the speedups are much more due to the
>>>improved architecture (more registers, insanely fast latency, better branch
>>>prediction, larger caches) than the 32 bit vs 64 bit difference. Crafty
>>>definetely has more 64 bit arithmethic than I do.
>>
>>65% is disappointing for Crafty. It should get the 65% from register and cache
>>like all other programs, plus whatever speedup the bitboards get.
>
>Wrong.
>
>Non bitboarders like diep have also a lot of potential for more complex cpu's.
>
> a) better usage of more registers (crafty is doing things very simple see code)

:)  Intelligent comment knowing how badly it needs more registers.


> b) better usage of PGO (profile guided optimizations)
> c) better icache usage (crafty fits within icache, nothing to optimize)

Please don't expose your ignorance to the world.  _anybody_ can look at
the crafty "kernel" to see that it has zero chance to fit into any I-cache
around.

> d) better bpt (branch prediction table) usage. crafty fits in simply
>    smaller bpt's.

You don't even understand branch prediction as it is, so why make such
a shallow comment?  PIV has _fewer_ btb entries, but branch prediction is
_much_ better than on previous Intel processors.  Do you know why?  I do.
Do you understand predicting _patterns_ as opposed to predicting a single
branch based on its history?  I do.  And so does Intel.



>
>My guess is sjeng is so fast because of the more registers and the bigger bpt in
>combination with faster (75%) random latency. measured with dieter's dblat
>program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7
>machine. dual xeon also is giving around 400.
>
>Do not underestimate point d.
>
>Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM
>latency does the vaste rest.
>
>Please compare speedup of crafty from K7 to itanium2 cpu:
>  itanium2 madison 1.3Ghz == 907
>  MP2400 2.0Ghz           == 1156
>  XP2100 1.73Ghz          == 1022
>  XP1700 1.46Ghz          == 867
>  XP1800 1.6Ghz           == 903
>
>So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz
>
>Now diep way more complex program can profit more from complex cpu and
>the PGO and loops:
>  K7 2Ghz == itanium2 1.3Ghz
>
>Crafty profits less from new generation cpu's than complex commercial programs
>in short.
>
>Point made clear?
>
>See for crafty specint:
>
>itanium at specint:
>http://www.specbench.org/osg/cpu2000/results/res2003q3/cpu2000-20030616-02241.html
>
>XP1800
>http://www.specbench.org/osg/cpu2000/results/res2001q4/cpu2000-20011008-01009.html
>
>MP2400 2.0Ghz at specint:
>http://www.specbench.org/osg/cpu2000/results/res2002q4/cpu2000-20021118-01838.html
>K7 1.73Ghz
>http://www.specbench.org/osg/cpu2000/results/res2002q2/cpu2000-20020422-01326.html
>
>>Not easy to optimize for new architecture without having access to the machine,
>>but I'd be _very_ surprised if a non-bitboarder turns out to be faster, it must
>>indicate a problem of some sort.
>>
>>Perhaps there is a bottleneck after all, Crafty has a ton of tables, many of
>>those could be generated on the fly. Could it really be trashing the large
>>cache?
>>I should be interesting to compare with e.g. gnuchess or some other bitboard
>>prog though.
>>
>>>When running 32 bit software (or more precisely, software using the old 8
>>>register instruction set), the Opteron is 42% and 65% faster than the Athlon
>>>and P4 respectively.
>>>
>>>This means that, even when running NOT optimized software, a 2.0Ghz Opteron
>>>is _still_ FASTER THAN ANY ATHLON XP OR PENTIUM 4 ON THE MARKET.
>>
>>You seem surprised? ;)
>>
>>>--
>>>GCP



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.