Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Gerd Isenberg

Date: 12:26:10 04/15/04

On April 15, 2004 at 13:21:06, Eugene Nalimov wrote:

>On April 15, 2004 at 12:19:47, Gian-Carlo Pascutto wrote:
>
>>On April 15, 2004 at 11:14:25, Eugene Nalimov wrote:
>>
>>>...or that Fritz has less instruction-level parallelism, so there is lot of
>>>idle execution units. Crafty is special due to bitboards...
>>
>>Fritz being hand-optimized assembly, that sounds a bit unlikely.
>
>I am not telling Fritz is not optimized. I am telling only that with bitboards
>on 32-bit architecture you have more ILP "for free".
>
>For example, something like
>
>    if (BK_bitboard & (WP_Attacks|WN_Attacks|WB_Attacks|WR_Attacks|WQ_Attacks))
>        ...
>
>would be compiled on x86 into
>
>    mov eax, WP_Attacks
>    mov edx, WP_Attacks+4
>    or  eax, WN_Attacks
>    or  edx, WN_Attacks+4
>    or  eax, WB_Attacks
>    or  edx, WB_Attacks+4
>    or  eax, WR_Attacks
>    or  edx, WR_Attacks+4
>    or  eax, WQ_Attacks
>    or  edx, WQ_Attacks+4
>    and eax, BK_bitboard
>    and edx, BK_bitboard+4
>    or  eax, edx
>    jz  ...

But i hope on x86-64 that code is faster, despite additional prefix bytes and no
more two independent instruction streams ;-)

    mov rax, WP_Attacks
    or  rax, WN_Attacks
    or  rax, WB_Attacks
    or  rax, WR_Attacks
    or  rax, WQ_Attacks
    and rax, BK_bitboard
    jz  ...

I have no idea whether "Superforwarding" is an issue here or only with floting
point alus.

Better that one?

    mov rax, WP_Attacks
    mov rdx, WN_Attacks
    or  rax, WB_Attacks
    or  rdx, WR_Attacks
    or  rax, WQ_Attacks
    or  rax, rdx
    and rax, BK_bitboard
    jz  ...


Thanks,
Gerd

>
>As you can see till the last test/branch there is 2 absolutely independent
>(intermixed) instructions streams that can be executed in parallel. I doubt you
>have such situations very often in non-bitboard programs.
>
>The same is true with branch mispredicts. Bitboard programs (usually) have less
>conditional branches in move generations, some parts of evaluation, etc., thus
>lessen # of mispredicted branches -- another HT opportunity.
>
>Of course bitboard programs have to pay for that -- their working set is larger.
>But with larger caches on Opteron that should be less an issue...
>
>Thanks,
>Eugene
>
>>--
>>GCP

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.