Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: WCCC: Almost no hardwareadvantage for Crafty

Author: Gerd Isenberg
Date: 01:25:46 06/30/04
On June 30, 2004 at 03:19:12, Tony Werten wrote:

>On June 29, 2004 at 18:03:10, Robert Hyatt wrote:
>
>>On June 29, 2004 at 16:04:23, Vincent Diepeveen wrote:
>>
>>>On June 29, 2004 at 12:52:43, Robert Hyatt wrote:
>>>
>>>>On June 29, 2004 at 12:31:00, Vincent Diepeveen wrote:
>>>>
>>>>>On June 29, 2004 at 09:00:51, Ingo Bauer wrote:
>>>>>
>>>>>>On June 29, 2004 at 08:26:15, Zach Wegner wrote:
>>>>>>
>>>>>>>One important point is that crafty uses bitboards, so it will have an additional
>>>>>>>speedup on a 64 bit processor.
>>>>>>
>>>>>>http://www.talkchess.com/forums/1/message.html?372849
>>>>>>
>>>>>>According to yesterdays news its ~47%. Assuming that the hardware is equal and
>>>>>>that double speed gives 60 ELO Crafty wins 30 ELO. We will see soon if this will
>>>>>>be enough.
>>>>>>
>>>>>>Bye Ingo
>>>>>
>>>>>The 32 bits version is using 8 registers.
>>>>>The 64 bits version uses 16 registers.
>>>>>
>>>>>And another few tiny differences.
>>>>>
>>>>>Crafty loses always 1 register to index which thread it is using, so the
>>>>>advantage of going from 8 to 16 is a big one.
>>>>>
>>>>>Then i do not know whether the 64 bits version uses inline assembly versus the
>>>>>32 bits version not using it and the compiler versions and type of compilers
>>>>>used is unclear.
>>>>
>>>>
>>>>If you are going to write about what you don't know, we are going to be here all
>>>>day.
>>>>
>>>>the pointer cost me 3-4% when I added it a few years back.  That is not going to
>>>>be a "big one" when moving to 16 registers.
>>>>
>>>>Both versions use inline assembly for FirstOne() and LastOne() and that's it.
>>>>There is no other assembly in Crafty other than my spinlock code for the SMP
>>>>stuff...
>>>>
>>>>On windows there is no inline asm at all as windows has a built-in intrinsic to
>>>>get to BSF/BSR...
>>>
>>>Do you run in windows at the world champs 2004?
>>
>>If I could, yes.  the compiled executables Eugene produces are faster than
>>anything I can do on linux..  And XP runs crafty just as well, and Eugene's numa
>>memory stuff works just fine with no twiddling as I have to do on linux from
>>version to version..
>>
>>However, here is the huge amount of inline asm I have in Crafty:
>>
>>int static __inline__ FirstOne(long word)
>>{
>>  long      dummy, dummy2;
>>
>>asm("          bsrq    %0, %1"                       "\n\t"
>>    "          jnz     1f"                           "\n\t"
>>    "          movq    $-1, %1"                      "\n\t"
>>    "1:        movq    $63, %0"                      "\n\t"
>>    "          subq    %1, %0"                       "\n\t"
>>    :"=r&"(dummy), "=r&" (dummy2)
>>    :"0"((long) (word))
>>    :"cc");
>>  return (dummy);
>>}
>>
>>
>>int static __inline__ LastOne(long word)
>>{
>>  long      dummy, dummy2;
>>
>>asm("          bsfq    %0, %1"                       "\n\t"
>>    "          jnz     1f"                           "\n\t"
>>    "          movq    $-1, %1"                      "\n\t"
>>    "1:        movq    $63, %0"                      "\n\t"
>>    "          subq    %1, %0"                       "\n\t"
>>    :"=r&"(dummy), "=r&" (dummy2)
>>    :"0"((long) (word))
>>    :"cc");
>>  return (dummy);
>>}
>>
>>
>>Ten whole assembler instructions.  And had I renumbered my bits in the right
>>way, this would be a grand total of _two_ assembler instructions rather than 10.
>
>I've read that before. What is the right way ?
>
>H8 at bit0 and A1 at bit63 ? And why ?
>
>Tony
>

I guess Bob's scheme is easier to map inside a human's brain ;-)
No further mirroring required. If you write down the hexadecimal or binary
representaion of a bitboard, the arithmetical least significant bit 0 is at the
very right end of the number, like the h-file from white's point of view:

Eg. you can use one byte for one particular rank immedialty in a natural way,
e.g. 0x82:

abcd efgh
1000 0010B

With my mapping (0==a0,8==a1,63==h8) i have to mirror or reverse the bits. So
0x82 becomes 0x41, which might be a pain during debugging.

hgfe dcba
1000 0010B

abcd efgh
0100 0001

A small bitboard viewer with board representations is helpfull.
During debugging i can copy 64-bit values from watch or variable inspect windows
to clipboard and paste it into the viewer.

Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.