Author: Vincent Diepeveen
Date: 05:46:43 08/07/03
Go up one level in this thread
On August 07, 2003 at 08:08:57, Sune Fischer wrote:
>On August 07, 2003 at 07:49:20, Vincent Diepeveen wrote:
>
>>>65% is disappointing for Crafty. It should get the 65% from register and cache
>>>like all other programs, plus whatever speedup the bitboards get.
>>
>>Wrong.
>>
>>Non bitboarders like diep have also a lot of potential for more complex cpu's.
>
>Of course.
>
>> a) better usage of more registers (crafty is doing things very simple see code)
>> b) better usage of PGO (profile guided optimizations)
>
>Yes I think this might actually help non-bitboarders more, I got nothing from
>PGO last I tested, I don't really have that many branches either, mostly tables.
>
>> c) better icache usage (crafty fits within icache, nothing to optimize)
>
>Are you sure about that?
>The rotated tables alone are 512 kB afaik.
i said icache Sune. You refer to dcache.
>> d) better bpt (branch prediction table) usage. crafty fits in simply
>> smaller bpt's.
>>
>>My guess is sjeng is so fast because of the more registers and the bigger bpt in
>>combination with faster (75%) random latency. measured with dieter's dblat
>>program. 229 ns latency at opteron with 500MB cache. versus 400 at my dual K7
>>machine. dual xeon also is giving around 400.
>>
>>Do not underestimate point d.
>>
>>Crafty is a simple program. Main profit directly 33% is 32==>64 bits. Then RAM
>>latency does the vaste rest.
>>
>>Please compare speedup of crafty from K7 to itanium2 cpu:
>> itanium2 madison 1.3Ghz == 907
>> MP2400 2.0Ghz == 1156
>> XP2100 1.73Ghz == 1022
>> XP1700 1.46Ghz == 867
>> XP1800 1.6Ghz == 903
>>
>>So for crafty K7 1.6Ghz == Itanium2 madison 1.3Ghz
>>
>>Now diep way more complex program can profit more from complex cpu and
>>the PGO and loops:
>> K7 2Ghz == itanium2 1.3Ghz
>>
>>Crafty profits less from new generation cpu's than complex commercial programs
>>in short.
>>
>>Point made clear?
>You seem to have missed one crucial point.
>Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that
>doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table
that's 33% at most. Just look to what the alpha 21264c scores versus similar
architecture K7. 33% difference about.
Itanium is a new generation complex cpu. too complex for bitboarders and it's
latency to main memory isn't very impressive which is bad luck for you.
what you need is fast ram.
Ever measured the difference between RAM speeds for your thing Sune?
You should.
So measure LATENCY differences. If a machine X has 220 ns latency versus some
other machine has 400 ns latency. Just measure what it speeds up for you.
>for that. Little things like that are all over the program, when I remove this
>and go pure 64 bit I do think a factor 2 clock for clock is reachable.
no. perhaps 33% for just going from 32 to 64 bits. You're really underestimating
how fast the overhead runs at the K7 here.
In those instructions there is very little branch mis predictions little
register stalls etc. It's all just a few more instructions code that 64 bits at
32 bits processors.
the datastructure itself however is a slow thing when compared to non-bitboard.
that's however a different discussion.
>>See for crafty specint:
>
>After I saw they tested with 32 bit binaries, I'm not prepared to give them much
>credit.
>Frankly I want Eugene or Hyatt to produce the binary, needs to be done right or
>you lose 30% real quick. The pure C version is a lowest common denominator
>compile, it sucks basicly.
Hyatt i wouldn't trust producing a textfile with speedup numbers even, but aside
from that yes Eugene probably has some cool executables from crafty.
>I also want to see other bitboard progs, I'm not sure Crafty is representative
crafty is very poor example of programming:
- inline assembly everywhere
- no nice loops but all written out black & white even
- every piece written out
- it doesn't compile very well with gcc or visual c++ thanks to
all of that hacking and hyatt doesn't care frankly.
To quote him: "My dual P4 xeon is what counts".
>for all, my program is very different, for better or worse of course.
>-S.
Let's hope you don't have the same mistakes. the bad example of crafty is really
that people start writing their own assembly. As if bitboards are fast anyway :)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.