Author: Vincent Diepeveen
Date: 04:49:20 08/07/03
Go up one level in this thread
On August 07, 2003 at 05:55:13, Sune Fischer wrote: >On August 07, 2003 at 05:30:57, Gian-Carlo Pascutto wrote: > >>>The Opteron was tested in SPEC. For their entire set of programs (also >>>non-chess and hence non-bitboarders), the Opteron is clock for clock 55% >>>faster than the Athlon XP. >> >>...and it's 74% faster clock for clock than the P4. >> >>For Crafty, the Opteron is 65% faster clock for clock than the Athlon XP. >>It's 165% (!!) faster clock for clock than the Pentium 4. >>(All from SPEC data) >> >> Opteron v Athlon Opteron vs P4 >>SPEC 1.55 1.74 >>Crafty 1.65 2.65 >>Sjeng 1.70 2.05 >> >>Crafty gets 7% more speedup on the Opteron compared to the average program, >>and Sjeng gets 10% more speedup compared to the average program. >> >>Based on this, it looks to me like the speedups are much more due to the >>improved architecture (more registers, insanely fast latency, better branch >>prediction, larger caches) than the 32 bit vs 64 bit difference. Crafty >>definetely has more 64 bit arithmethic than I do. > >65% is disappointing for Crafty. It should get the 65% from register and cache >like all other programs, plus whatever speedup the bitboards get. Wrong. Non bitboarders like diep have also a lot of potential for more complex cpu's. a) better usage of more registers (crafty is doing things very simple see code) b) better usage of PGO (profile guided optimizations) c) better icache usage (crafty fits within icache, nothing to optimize) d) better bpt (branch prediction table) usage. crafty fits in simply smaller bpt's. My guess is sjeng is so fast because of the more registers and the bigger bpt in combination with faster (75%) random latency. measured with dieter's dblat program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7 machine. dual xeon also is giving around 400. Do not underestimate point d. Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM latency does the vaste rest. Please compare speedup of crafty from K7 to itanium2 cpu: itanium2 madison 1.3Ghz == 907 MP2400 2.0Ghz == 1156 XP2100 1.73Ghz == 1022 XP1700 1.46Ghz == 867 XP1800 1.6Ghz == 903 So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz Now diep way more complex program can profit more from complex cpu and the PGO and loops: K7 2Ghz == itanium2 1.3Ghz Crafty profits less from new generation cpu's than complex commercial programs in short. Point made clear? See for crafty specint: itanium at specint: http://www.specbench.org/osg/cpu2000/results/res2003q3/cpu2000-20030616-02241.html XP1800 http://www.specbench.org/osg/cpu2000/results/res2001q4/cpu2000-20011008-01009.html MP2400 2.0Ghz at specint: http://www.specbench.org/osg/cpu2000/results/res2002q4/cpu2000-20021118-01838.html K7 1.73Ghz http://www.specbench.org/osg/cpu2000/results/res2002q2/cpu2000-20020422-01326.html >Not easy to optimize for new architecture without having access to the machine, >but I'd be _very_ surprised if a non-bitboarder turns out to be faster, it must >indicate a problem of some sort. > >Perhaps there is a bottleneck after all, Crafty has a ton of tables, many of >those could be generated on the fly. Could it really be trashing the large >cache? >I should be interesting to compare with e.g. gnuchess or some other bitboard >prog though. > >>When running 32 bit software (or more precisely, software using the old 8 >>register instruction set), the Opteron is 42% and 65% faster than the Athlon >>and P4 respectively. >> >>This means that, even when running NOT optimized software, a 2.0Ghz Opteron >>is _still_ FASTER THAN ANY ATHLON XP OR PENTIUM 4 ON THE MARKET. > >You seem surprised? ;) > >>-- >>GCP
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.