Author: Gerd Isenberg
Date: 12:26:10 04/15/04
Go up one level in this thread
On April 15, 2004 at 13:21:06, Eugene Nalimov wrote:
>On April 15, 2004 at 12:19:47, Gian-Carlo Pascutto wrote:
>
>>On April 15, 2004 at 11:14:25, Eugene Nalimov wrote:
>>
>>>...or that Fritz has less instruction-level parallelism, so there is lot of
>>>idle execution units. Crafty is special due to bitboards...
>>
>>Fritz being hand-optimized assembly, that sounds a bit unlikely.
>
>I am not telling Fritz is not optimized. I am telling only that with bitboards
>on 32-bit architecture you have more ILP "for free".
>
>For example, something like
>
> if (BK_bitboard & (WP_Attacks|WN_Attacks|WB_Attacks|WR_Attacks|WQ_Attacks))
> ...
>
>would be compiled on x86 into
>
> mov eax, WP_Attacks
> mov edx, WP_Attacks+4
> or eax, WN_Attacks
> or edx, WN_Attacks+4
> or eax, WB_Attacks
> or edx, WB_Attacks+4
> or eax, WR_Attacks
> or edx, WR_Attacks+4
> or eax, WQ_Attacks
> or edx, WQ_Attacks+4
> and eax, BK_bitboard
> and edx, BK_bitboard+4
> or eax, edx
> jz ...
But i hope on x86-64 that code is faster, despite additional prefix bytes and no
more two independent instruction streams ;-)
mov rax, WP_Attacks
or rax, WN_Attacks
or rax, WB_Attacks
or rax, WR_Attacks
or rax, WQ_Attacks
and rax, BK_bitboard
jz ...
I have no idea whether "Superforwarding" is an issue here or only with floting
point alus.
Better that one?
mov rax, WP_Attacks
mov rdx, WN_Attacks
or rax, WB_Attacks
or rdx, WR_Attacks
or rax, WQ_Attacks
or rax, rdx
and rax, BK_bitboard
jz ...
Thanks,
Gerd
>
>As you can see till the last test/branch there is 2 absolutely independent
>(intermixed) instructions streams that can be executed in parallel. I doubt you
>have such situations very often in non-bitboard programs.
>
>The same is true with branch mispredicts. Bitboard programs (usually) have less
>conditional branches in move generations, some parts of evaluation, etc., thus
>lessen # of mispredicted branches -- another HT opportunity.
>
>Of course bitboard programs have to pay for that -- their working set is larger.
>But with larger caches on Opteron that should be less an issue...
>
>Thanks,
>Eugene
>
>>--
>>GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.