Author: Uri Blass
Date: 05:15:08 08/07/03
Go up one level in this thread
On August 07, 2003 at 08:08:57, Sune Fischer wrote: >On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote: > >>>65% is disappointing for Crafty. It should get the 65% from register and cache >>>like all other programs, plus whatever speedup the bitboards get. >> >>Wrong. >> >>Non bitboarders like diep have also a lot of potential for more complex cpu's. > >Of course. > >> a) better usage of more registers (crafty is doing things very simple see code) >> b) better usage of PGO (profile guided optimizations) > >Yes I think this might actually help non-bitboarders more, I got nothing from >PGO last I tested, I don't really have that many branches either, mostly tables. > >> c) better icache usage (crafty fits within icache, nothing to optimize) > >Are you sure about that? >The rotated tables alone are 512 kB afaik. > >> d) better bpt (branch prediction table) usage. crafty fits in simply >> smaller bpt's. >> >>My guess is sjeng is so fast because of the more registers and the bigger bpt in >>combination with faster (75%) random latency. measured with dieter's dblat >>program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7 >>machine. dual xeon also is giving around 400. >> >>Do not underestimate point d. >> >>Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM >>latency does the vaste rest. >> >>Please compare speedup of crafty from K7 to itanium2 cpu: >> itanium2 madison 1.3Ghz == 907 >> MP2400 2.0Ghz == 1156 >> XP2100 1.73Ghz == 1022 >> XP1700 1.46Ghz == 867 >> XP1800 1.6Ghz == 903 >> >>So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz >> >>Now diep way more complex program can profit more from complex cpu and >>the PGO and loops: >> K7 2Ghz == itanium2 1.3Ghz >> >>Crafty profits less from new generation cpu's than complex commercial programs >>in short. >> >>Point made clear? > >You seem to have missed one crucial point. >Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that >doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table >for that. I guess that it is only for 64 bits and if you have 32 bits number then it is better to do 1<<i when 0<=i<32 and not to use arrays. Correct? Uri
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.