Author: Eugene Nalimov
Date: 10:21:06 04/15/04
Go up one level in this thread
On April 15, 2004 at 12:19:47, Gian-Carlo Pascutto wrote:
>On April 15, 2004 at 11:14:25, Eugene Nalimov wrote:
>
>>...or that Fritz has less instruction-level parallelism, so there is lot of
>>idle execution units. Crafty is special due to bitboards...
>
>Fritz being hand-optimized assembly, that sounds a bit unlikely.
I am not telling Fritz is not optimized. I am telling only that with bitboards
on 32-bit architecture you have more ILP "for free".
For example, something like
if (BK_bitboard & (WP_Attacks|WN_Attacks|WB_Attacks|WR_Attacks|WQ_Attacks))
...
would be compiled on x86 into
mov eax, WP_Attacks
mov edx, WP_Attacks+4
or eax, WN_Attacks
or edx, WN_Attacks+4
or eax, WB_Attacks
or edx, WB_Attacks+4
or eax, WR_Attacks
or edx, WR_Attacks+4
or eax, WQ_Attacks
or edx, WQ_Attacks+4
and eax, BK_bitboard
and edx, BK_bitboard+4
or eax, edx
jz ...
As you can see till the last test/branch there is 2 absolutely independent
(intermixed) instructions streams that can be executed in parallel. I doubt you
have such situations very often in non-bitboard programs.
The same is true with branch mispredicts. Bitboard programs (usually) have less
conditional branches in move generations, some parts of evaluation, etc., thus
lessen # of mispredicted branches -- another HT opportunity.
Of course bitboard programs have to pay for that -- their working set is larger.
But with larger caches on Opteron that should be less an issue...
Thanks,
Eugene
>--
>GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.