Author: Sune Fischer
Date: 05:08:57 08/07/03
Go up one level in this thread
On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote: >>65% is disappointing for Crafty. It should get the 65% from register and cache >>like all other programs, plus whatever speedup the bitboards get. > >Wrong. > >Non bitboarders like diep have also a lot of potential for more complex cpu's. Of course. > a) better usage of more registers (crafty is doing things very simple see code) > b) better usage of PGO (profile guided optimizations) Yes I think this might actually help non-bitboarders more, I got nothing from PGO last I tested, I don't really have that many branches either, mostly tables. > c) better icache usage (crafty fits within icache, nothing to optimize) Are you sure about that? The rotated tables alone are 512 kB afaik. > d) better bpt (branch prediction table) usage. crafty fits in simply > smaller bpt's. > >My guess is sjeng is so fast because of the more registers and the bigger bpt in >combination with faster (75%) random latency. measured with dieter's dblat >program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7 >machine. dual xeon also is giving around 400. > >Do not underestimate point d. > >Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM >latency does the vaste rest. > >Please compare speedup of crafty from K7 to itanium2 cpu: > itanium2 madison 1.3Ghz == 907 > MP2400 2.0Ghz == 1156 > XP2100 1.73Ghz == 1022 > XP1700 1.46Ghz == 867 > XP1800 1.6Ghz == 903 > >So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz > >Now diep way more complex program can profit more from complex cpu and >the PGO and loops: > K7 2Ghz == itanium2 1.3Ghz > >Crafty profits less from new generation cpu's than complex commercial programs >in short. > >Point made clear? You seem to have missed one crucial point. Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table for that. Little things like that are all over the program, when I remove this and go pure 64 bit I do think a factor 2 clock for clock is reachable. >See for crafty specint: After I saw they tested with 32 bit binaries, I'm not prepared to give them much credit. Frankly I want Eugene or Hyatt to produce the binary, needs to be done right or you lose 30% real quick. The pure C version is a lowest common denominator compile, it sucks basicly. I also want to see other bitboard progs, I'm not sure Crafty is representative for all, my program is very different, for better or worse of course. -S.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.