Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboard vs Offset again ! now after Rybka

Author: Vasik Rajlich

Date: 06:02:11 01/20/06

Go up one level in this thread


On January 20, 2006 at 02:51:01, Chan Rasjid wrote:

>On January 19, 2006 at 14:26:11, Gerd Isenberg wrote:
>
>Thanks Gerd,
>
>As expected, we cannot trick the world. I did not suspect boolean may translate
>to branching of 0/1, am still learning!
>
>Incidentally, I'll like to mention the bitboard vs offset debate. I can now see
>a complete BB implentation as I have started from scratch and now have completed
>a BB implementation except for eval() and 1 line  trigering an assert(). So
>everything's fine. It does not need a BB move generator! rather it is BB
>incremental generation plus partial as we can easily extract 1 move at a time.
>
>In the past Christophe Theron argued against BB pointing out not a single BB
>engine made it to to SSDF list. Now we have Rybka at the top so likely others
>may think a little more about BB.
>
>1) BB_TO_SQ() call - I am not sure this was one of the main contentious issue.
>I dont do rotated,just improvise. I doubt rotated will add much advantage as
>there is the law of natural limitation in everything. For sliding pieces
>and given a 2 bit BB of bbFrom, bbTo, I will have the fromSq and need to get
>toSq. I try this. The sliding path space of 4 are +15, +17, +16, +1.
>
>signed i64 bbTo, bb2bit;
>
>bbTo -= bbFrom;
>no_of_step = i = popcount(bbTo * (1 | bbTo >> 63) & dir_path);//may use abs()
>toSq = fromSQ + i * (1 | bbTo >> 63) * positive_step;
>
>We may also use popcount((bb2bit & bb2bit - 1 - (bb2bit & -bb2bit))& dir_path);
>
>I have not tested if the above is better then BB_TO_SQ(), at least I'm happy not
> having to shift by ranks with a table lookup.
>
>2) I think all BB engine don't have a generator, just incremental updates.
>BB moves are basically the attack BB of all the pieces + 1 extra BB for pawns
>push. N / K don't even need any update unless they move and I have this :-
>
>static const u64 kDzE4 = 0x0000382838000000;
>static const u64 nDzE4 = 0x0028440044280000;
>
>static const u64 nMask[2] = {FILE_3TO8, FILE_1TO6};
>static const u64 kMask[2] = {FILE_2TO8, FILE_1TO7};
>
>u64 getNMap(int sq){
> if (sq >= E4)
> return (nDzE4 << (sq - E4)) & nMask[FILE64(sq) - 4 >> 31 & 1];
> return (nDzE4 >> (E4 - sq)) & nMask[FILE64(sq) - 4 >> 31 & 1];
>}
>
>
>u64 getKMap(int sq){
> if (sq >= E4)
> return (kDzE4 << (sq - E4)) & kMask[FILE64(sq) - 4 >> 31 & 1];
> return (kDzE4 >> (E4 - sq)) & kMask[FILE64(sq) - 4 >> 31 & 1];
>}
>
>Pawns are simple.
>
>We have :-
>//pS is pointer to global stack for my non-recursive search.
>
>  *(pS + 1)->bbMoveStack = *(pS - 1)->bbMoveStack;
>  doMove(board, pS, *pS->pCurrentMove, side);
>  if (genIncrementalBB(board, pS + 1, *pS->pCurrentMove, incheck,
>   pS->sqCaptureEP, side ^ 1 , side, *board->pAllBits));
>  else{//capt King, tmp
>   undoMove(board, pS, *pS->pCurrentMove, side);
>   //etc..
>   ++pS->pCurrentMove;
>   continue;
> }
>
> genIncrementalBB(board, pS, *pS->pCurrentMove, incheck,
>  pS->sqCaptureEP, side, side, *board->pAllBits);
>
>At this point I have the complete BB attack-tables of the two sides and may
>simply extract 1 move at a time and have complete info to select the best move
>of each phase.With offset, I cannot see how incremental move generation may be
>done. Fruit may have partial but not incremental generation.
>

I'm a little confused as to what exactly you're doing in the above code. For
example, getting knight target squares is best done via simply:

unsigned int64 knight_targets = knight_target_array [knight_sq] &
~same_color_pieces;

Anyway, it's not clear that an incremental approach is faster - and it's a hell
of a lot messier. I just compute everything when I need it.

>3) QS move generation - In the past debate, it seemed no one mentioned this.
>   90% of all nodes are QS nodes. Getting BB captures are the simplest. With
>offset we still have to start from the targets and test by sliding thru empty
>squares etc. So BB here may have a significantly influential advantage. Is this
>the "secret of Rybka" ?
>

I don't know that it's really possible to compare board representations, and a
few % speedup isn't all that important. Bitboards are certainly very elegant
when it comes to various forms of "selective move generation".

Vas

>Best Regards,
>Rasjid
>
>
>January 19, 2006 at 12:45:10, Chan Rasjid wrote:
>>
>>>
>>>When using bitboards, often we have arrays that require only indices 0 / 1.
>>>eg if we have no underpromotions, then there is either 1/2 knights per color.
>>>Sometimes ,in such situations, codes may be re-written without using the if ()
>>>statement. But whether such simple tricks can avoid the cost of branch
>>>misprediction ( I only have a vague idea about this ).
>>>
>>>X[i] where i = 0 / 1;
>>>
>>>1) if ( a & b ){
>>>     j = X[1];
>>>   }else{
>>>     j = X[0];
>>>   }
>>>   //etc.. codes dependent on j.
>>>
>>>2) Now without if() :-
>>>
>>>   j = X[(a & b)!= 0];
>>>   //etc.. codes dependent on j.
>>
>>Hi Rasjid,
>>
>>whether a boolean statement works without branches depends on the compiler and
>>architecture. Recent x86 compiler should do at least 2) branchless, using a
>>setnz-instruction for (a & b)!= 0, for instance:
>>
>>xor   edx, edx ; clear dword
>>test  rax, rbx ; get the zero flag
>>setnz dl       ; edx := (a & b) ? 1 : 0
>>mov   edx, X[edx*4] ; assuming X is int array
>>
>>>
>>>How (if ? ) can such a simple re-coding cause an improvement. Can there
>>>be some situations related to the above with significanr misprediction
>>>overheads ?
>>
>>If the pattern is hard to predict and/or you have so many branches, that the
>>branch target buffer is likely to get trashed, i think it is a good idea to
>>avoid conditional jumps. Specially with very small conditional bodies in small,
>>but often used inlined functions or macros.
>>
>>With pattern like 99% of the time 00010001... (for instance 0=then case taken,
>>1=else case taken) conditional jumps are fine. You may try to avoid else cases
>>if possible as well - or leave that and possible branchless tricks to the
>>"pogoable" compiler.
>>
>>While x86 branch replace instructions like setcc, cmovcc, sbb and adc are flag
>>dependent, they don't allow a more dense instruction scheduling for multiple
>>conditional assignments. That is not true for "sign flag" dependency after a
>>subtraction, where the sign flag is also present as MSB inside a register. So
>>one may use logical shift right 31 to get boolean {0,1}-values or arithmetical
>>shift right 31 (or cdq) for {0,-1}-masks, if you are aware of the range of the
>>signed or unsigned values to subtract, for instance:
>>
>>if ( a > b ) x += 7;
>>if ( c > d ) x += 5;
>>
>>// use < 0 expressions - considering ranges!
>>// this unsafe optimization can not be done by the compiler!
>>if ( (signed)(b-a) < 0 ) x += 7;
>>if ( (signed)(d-c) < 0 ) x += 5;
>>
>>// the branchless approach with
>>// some prospects of parallel work
>>x += ((signed)(b-a)>>31) & 7;
>>x += ((signed)(d-c)>>31) & 5;
>>
>>possible assembly:
>>
>>sub   ebx, eax ; b-a
>>sub   edx, ecx ; d-c
>>sar   ebx, 31
>>sar   edx, 31
>>and   ebx, 7
>>and   edx, 5
>>add   r08, ebx ; x += ((signed)(b-a)>>31) & 7
>>add   r08, edx ; x += ((signed)(d-c)>>31) & 5
>>
>>Pairs quite well, but P4 sucks due to slow shift.
>>On amd that will take about the same number of cycles as one sequential
>>conditional assignment. If processing vectors of this stuff, sse2 comes in mind.
>>
>>Gerd
>>
>>>
>>>Thanks
>>>Rasjid



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.