Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboard engine speed difference on different processors

Author: Tim Foden

Date: 05:58:33 04/20/03

Go up one level in this thread


On April 19, 2003 at 07:52:43, Uri Blass wrote:

>On April 19, 2003 at 06:35:42, Tony Werten wrote:
>
>>On April 19, 2003 at 05:08:40, Uri Blass wrote:
>>
>>>On April 19, 2003 at 04:54:29, Tony Werten wrote:
>>>
>>>>On April 19, 2003 at 04:07:37, Uri Blass wrote:
>>>>
>>>>>On April 19, 2003 at 03:37:06, Tony Werten wrote:
>>>>>
>>>>>>On April 17, 2003 at 20:35:35, James Robertson wrote:
>>>>>>
>>>>>>>Out of curiosity I tested just the move generation and basic board functions of
>>>>>>>my bitboard chess program on several different computers. My home computer is a
>>>>>>>Pentium 933mhz, and the other computers I used were Athlons in the 1.6ghz range.
>>>>>>>
>>>>>>>My program's move generator runs at roughly the same speed on both systems. I
>>>>>>>was surprised and tested using several different compilers (VC5, VC6, .NET,
>>>>>>>gcc), under Windows and under Linux. To compare more easily, I wrote a simple
>>>>>>>non-bitboard move generator and tested this on all of the machines. The speed
>>>>>>>differences scaled with the speed of the processors, which seemed logical.
>>>>>>>However, I still cannot explain why the bitboard functions are so much slower on
>>>>>>>the faster computers. The only difference I can see is that my home computer is
>>>>>>>a pentium and the others are athlons.
>>>>>>>
>>>>>>>It seems strange that this would make such a large difference. Can anyone give
>>>>>>>any reasons why? I used no assembly, just C/C++ code, with all the default
>>>>>>>compiler options on all tests.
>>>>>>
>>>>>>I think it's because bitboards tend to fill up the caches, so memory becomes the
>>>>>>bottleneck.
>>>>>>
>>>>>>With other approaches this doesn't happen, ( until you add the big stuff like
>>>>>>eval ) so all things stay in chache  wich makes it (almost) only processor
>>>>>>limited.
>>>>>>
>>>>>>Tony
>>>>>
>>>>>Does it mean that bitboard chess programs or programs with big evaluation are
>>>>>basically optimized for the old hardware because they cannot get much from new
>>>>>hardware?
>>>>
>>>>No, because you're doing more then, wich can go faster with faster hardware. ie
>>>>You don't reach the move generation bottleneck, but have others.
>>>>
>>>>But if all you are doing, has a memory bottleneck (wich happens when only
>>>>generating moves with bitboards ), a faster cpu won't help very much.
>>>>
>>>>Tony
>>>
>>>The question is how can I identify memory bottleneck.
>>>
>>>I use bitboards only for pawn structure but I have a big move generator(some
>>>thousands of C lines).
>>
>>I don't think the amount of code counts. It's the amount of datastructures it
>>operates on.
>
>In that case I do not understand why bitboard cause a problems.
>
>How much data structure do you have with bitboards.
>
>some 64 bit varaible and some arrays does not seem to me much relative to the
>hash tables that is the biggest data structure that programs use.
>
>If size is not the amount of data structure then how do you calculate that
>amount?

In Green Light the main lookup tables required for the rotated bitboard
implementation use 512KB of memory.

UINT64  m_rankMoves[64][256];  // 128K pos/mask
UINT64  m_fileMoves[64][256];  // 128K pos/mask
UINT64  m_diag1Moves[64][256]; // 128K pos/mask
UINT64  m_diag2Moves[64][256]; // 128K pos/mask

it is certainly possible to reduce this, if you allow the code that generates
moves to do more lookups in smaller tables, and some extra logic operations.  I
have another program I am developing, that also uses the same rotated bitboards
as Green Light, but only uses these tables:

BYTE    s_patMoves[8][256];    // 2.00K [R/F][pattern]
UINT64  s_extendVert[256];     // 2.00K [pattern]
UINT64  s_extendHorz[256];     // 2.00K [pattern]

const UINT64 s_rankMask[8];    // 0.06K [file]
const UINT64 s_fileMask[8];    // 0.06K [rank]
const UINT64 s_diagAMask[15];  // 0.12K [diagA]
const UINT64 s_diagHMask[15];  // 0.12K [diagH]

It currently generates moves at about 90% of the speed of Green Light.  I have
yet to measure whether the better cache performance would make a difference to
the chess program overall (so far this new program only generates moves and
performs perft).

Green Light movegen perf:

Generate moves:
Iterations:        1000000  Time: 0.751
Iterations/sec:  1331557.9  Moves/sec: 26631158.5
Generate, make, unmake moves:
Iterations:        1000000  Time: 3.545
Iterations/sec:   282087.4  Moves/sec: 5641748.9


New program movegen perf:

Generate moves:
Iterations:        1000000  Time: 0.841
Iterations/sec:  1189060.6  Moves/sec: 23781212.8
Generate/make/undo moves:
Iterations:        1000000  Time: 5.608
Iterations/sec:   178316.7  Moves/sec: 3566333.8

Cheers, Tim.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.