Author: Robert Hyatt
Date: 08:17:46 08/07/03
Go up one level in this thread
On August 07, 2003 at 08:08:57, Sune Fischer wrote: >On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote: > >>>65% is disappointing for Crafty. It should get the 65% from register and cache >>>like all other programs, plus whatever speedup the bitboards get. >> >>Wrong. >> >>Non bitboarders like diep have also a lot of potential for more complex cpu's. > >Of course. > >> a) better usage of more registers (crafty is doing things very simple see code) >> b) better usage of PGO (profile guided optimizations) > >Yes I think this might actually help non-bitboarders more, I got nothing from >PGO last I tested, I don't really have that many branches either, mostly tables. Crafty gains about 15% from PGO on the Intel compiler. Of course, if you read vincent's "profiling methodology" you might see why it fails for him. > >> c) better icache usage (crafty fits within icache, nothing to optimize) > >Are you sure about that? >The rotated tables alone are 512 kB afaik. > >> d) better bpt (branch prediction table) usage. crafty fits in simply >> smaller bpt's. >> >>My guess is sjeng is so fast because of the more registers and the bigger bpt in >>combination with faster (75%) random latency. measured with dieter's dblat >>program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7 >>machine. dual xeon also is giving around 400. >> >>Do not underestimate point d. >> >>Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM >>latency does the vaste rest. >> >>Please compare speedup of crafty from K7 to itanium2 cpu: >> itanium2 madison 1.3Ghz == 907 >> MP2400 2.0Ghz == 1156 >> XP2100 1.73Ghz == 1022 >> XP1700 1.46Ghz == 867 >> XP1800 1.6Ghz == 903 >> >>So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz >> >>Now diep way more complex program can profit more from complex cpu and >>the PGO and loops: >> K7 2Ghz == itanium2 1.3Ghz >> >>Crafty profits less from new generation cpu's than complex commercial programs >>in short. >> >>Point made clear? > >You seem to have missed one crucial point. >Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that >doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table >for that. Little things like that are all over the program, when I remove this >and go pure 64 bit I do think a factor 2 clock for clock is reachable. > >>See for crafty specint: > >After I saw they tested with 32 bit binaries, I'm not prepared to give them much >credit. >Frankly I want Eugene or Hyatt to produce the binary, needs to be done right or >you lose 30% real quick. The pure C version is a lowest common denominator >compile, it sucks basicly. > >I also want to see other bitboard progs, I'm not sure Crafty is representative >for all, my program is very different, for better or worse of course. > >-S.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.