Author: Robert Hyatt
Date: 08:15:54 08/07/03
Go up one level in this thread
On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote: >On August 07, 2003 at 05:55:13, Sune Fischer wrote: > >>On August 07, 2003 at 05:30:57, Gian-Carlo Pascutto wrote: >> >>>>The Opteron was tested in SPEC. For their entire set of programs (also >>>>non-chess and hence non-bitboarders), the Opteron is clock for clock 55% >>>>faster than the Athlon XP. >>> >>>...and it's 74% faster clock for clock than the P4. >>> >>>For Crafty, the Opteron is 65% faster clock for clock than the Athlon XP. >>>It's 165% (!!) faster clock for clock than the Pentium 4. >>>(All from SPEC data) >>> >>> Opteron v Athlon Opteron vs P4 >>>SPEC 1.55 1.74 >>>Crafty 1.65 2.65 >>>Sjeng 1.70 2.05 >>> >>>Crafty gets 7% more speedup on the Opteron compared to the average program, >>>and Sjeng gets 10% more speedup compared to the average program. >>> >>>Based on this, it looks to me like the speedups are much more due to the >>>improved architecture (more registers, insanely fast latency, better branch >>>prediction, larger caches) than the 32 bit vs 64 bit difference. Crafty >>>definetely has more 64 bit arithmethic than I do. >> >>65% is disappointing for Crafty. It should get the 65% from register and cache >>like all other programs, plus whatever speedup the bitboards get. > >Wrong. > >Non bitboarders like diep have also a lot of potential for more complex cpu's. > > a) better usage of more registers (crafty is doing things very simple see code) :) Intelligent comment knowing how badly it needs more registers. > b) better usage of PGO (profile guided optimizations) > c) better icache usage (crafty fits within icache, nothing to optimize) Please don't expose your ignorance to the world. _anybody_ can look at the crafty "kernel" to see that it has zero chance to fit into any I-cache around. > d) better bpt (branch prediction table) usage. crafty fits in simply > smaller bpt's. You don't even understand branch prediction as it is, so why make such a shallow comment? PIV has _fewer_ btb entries, but branch prediction is _much_ better than on previous Intel processors. Do you know why? I do. Do you understand predicting _patterns_ as opposed to predicting a single branch based on its history? I do. And so does Intel. > >My guess is sjeng is so fast because of the more registers and the bigger bpt in >combination with faster (75%) random latency. measured with dieter's dblat >program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7 >machine. dual xeon also is giving around 400. > >Do not underestimate point d. > >Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM >latency does the vaste rest. > >Please compare speedup of crafty from K7 to itanium2 cpu: > itanium2 madison 1.3Ghz == 907 > MP2400 2.0Ghz == 1156 > XP2100 1.73Ghz == 1022 > XP1700 1.46Ghz == 867 > XP1800 1.6Ghz == 903 > >So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz > >Now diep way more complex program can profit more from complex cpu and >the PGO and loops: > K7 2Ghz == itanium2 1.3Ghz > >Crafty profits less from new generation cpu's than complex commercial programs >in short. > >Point made clear? > >See for crafty specint: > >itanium at specint: >http://www.specbench.org/osg/cpu2000/results/res2003q3/cpu2000-20030616-02241.html > >XP1800 >http://www.specbench.org/osg/cpu2000/results/res2001q4/cpu2000-20011008-01009.html > >MP2400 2.0Ghz at specint: >http://www.specbench.org/osg/cpu2000/results/res2002q4/cpu2000-20021118-01838.html >K7 1.73Ghz >http://www.specbench.org/osg/cpu2000/results/res2002q2/cpu2000-20020422-01326.html > >>Not easy to optimize for new architecture without having access to the machine, >>but I'd be _very_ surprised if a non-bitboarder turns out to be faster, it must >>indicate a problem of some sort. >> >>Perhaps there is a bottleneck after all, Crafty has a ton of tables, many of >>those could be generated on the fly. Could it really be trashing the large >>cache? >>I should be interesting to compare with e.g. gnuchess or some other bitboard >>prog though. >> >>>When running 32 bit software (or more precisely, software using the old 8 >>>register instruction set), the Opteron is 42% and 65% faster than the Athlon >>>and P4 respectively. >>> >>>This means that, even when running NOT optimized software, a 2.0Ghz Opteron >>>is _still_ FASTER THAN ANY ATHLON XP OR PENTIUM 4 ON THE MARKET. >> >>You seem surprised? ;) >> >>>-- >>>GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.