Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some new hyper-threading info.

Author: Eugene Nalimov

Date: 10:21:06 04/15/04

Go up one level in this thread


On April 15, 2004 at 12:19:47, Gian-Carlo Pascutto wrote:

>On April 15, 2004 at 11:14:25, Eugene Nalimov wrote:
>
>>...or that Fritz has less instruction-level parallelism, so there is lot of
>>idle execution units. Crafty is special due to bitboards...
>
>Fritz being hand-optimized assembly, that sounds a bit unlikely.

I am not telling Fritz is not optimized. I am telling only that with bitboards
on 32-bit architecture you have more ILP "for free".

For example, something like

    if (BK_bitboard & (WP_Attacks|WN_Attacks|WB_Attacks|WR_Attacks|WQ_Attacks))
        ...

would be compiled on x86 into

    mov eax, WP_Attacks
    mov edx, WP_Attacks+4
    or  eax, WN_Attacks
    or  edx, WN_Attacks+4
    or  eax, WB_Attacks
    or  edx, WB_Attacks+4
    or  eax, WR_Attacks
    or  edx, WR_Attacks+4
    or  eax, WQ_Attacks
    or  edx, WQ_Attacks+4
    and eax, BK_bitboard
    and edx, BK_bitboard+4
    or  eax, edx
    jz  ...

As you can see till the last test/branch there is 2 absolutely independent
(intermixed) instructions streams that can be executed in parallel. I doubt you
have such situations very often in non-bitboard programs.

The same is true with branch mispredicts. Bitboard programs (usually) have less
conditional branches in move generations, some parts of evaluation, etc., thus
lessen # of mispredicted branches -- another HT opportunity.

Of course bitboard programs have to pay for that -- their working set is larger.
But with larger caches on Opteron that should be less an issue...

Thanks,
Eugene

>--
>GCP



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.