Author: Vincent Diepeveen
Date: 14:34:09 10/30/02
Go up one level in this thread
On October 30, 2002 at 01:51:36, Tony Werten wrote:
>On October 29, 2002 at 19:55:08, Vincent Diepeveen wrote:
>
>>On October 29, 2002 at 18:51:53, Gerd Isenberg wrote:
>>
>>><snip>
>>>>>>You could go for something cheap to detect checks in qsearch
>>>>>>and just see whether the 'to' square possibly attacks the king
>>>>>>square, only after that call a function.
>>>>>>
>>>>>>if( quickchecktable[piece][63+kingsq-to]
>>>>>> && slowcheck(piece,to,kingsq) )
>>>>>> check = true; ...
>>>>>
>>>>>In that case I may miss checks that are not done by the piece that moved.
>>>>
>>>>You are concerned about xray checks?
>>>>
>>>>In DIEP i do things like that, but i don't know many engines
>>>>which do checks in qsearch which do them. Note that i do
>>>>unlimited checks in the qsearch for like 32
>>>>ply in a row or so (my stackdepth of qsearch currently is 32 ply
>>>>i did see no reason yet to make that more).
>>>>
>>>>In any case, here is how i do things in qsearch, but i have to
>>>>warn you that it's very slow if you want to get a million
>>>>nodes a second :)
>>>>
>>>>Note that you see i'm using incremental attack tables too.
>>>>
>>>>For a number of years i could do without incremental attack tables
>>>>by simply only using them in evaluation a lot and quite a bit less
>>>>in move ordering.
>>>>
>>>>>I still have ways to improve the speed without missing checks.
>>>>
>>>>In case you want to do *all* checks, then you might want to use
>>>>attacktables too.
>>>>
>>>>Note that i simply generate all moves in qsearch. Total move generation
>>>>all together is like 0.6% system time anyway.
>>>>
>>>>Then i can do all moves from which i gamble that they are making my
>>>>search more quiet. With big evaluation not necessarily captures and
>>>>checks are the only moves that can modify evaluation a lot.
>>>>
>>>>>Note that I cannot use the difference in the squares to decide if a rook pseudo
>>>>>attack the king because the difference between h2 and a3 is the same as the
>>>>>difference between a3 and b3.
>>>>
>>>>but it's very fast to use such a small table that eliminates already the
>>>>vaste majority.
>>>>
>>>>But indeed i also got rid of that primitive table and use a bigger table
>>>>now. The primitive table works great however for programs that want to
>>>>get a million nodes a second and extend check evasions (so we do not
>>>>talk about qsearch here but about normal search where of course
>>>>calling a function to detect whether you are in check is way way too
>>>>slow).
>>>>
>>>>Here is some code from DIEP hoping you will have a look at it:
>>>>
>>>> /* Eerst controleren op schaak */
>>>> if( piece != king && looprichting[piece][t][OpKing]
>>>> && ScanAttack(side,piece,t,OpKing) ) {
>>>> nt->zet |= move_checks;
>>>> }
>>>>
>>>> /* Aftrekschaak */
>>>> if( (AttM[f]&(ctlB|ctlR|ctlQ)) ) {
>>>> if( ((AttM[f]&ctlB) && looprichting[bishop][f][OpKing]
>>>> && ScanXrayAttack(side,bishop,f,OpKing))
>>>> || ((AttM[f]&ctlR) && looprichting[rook][f][OpKing]
>>>> && ScanXrayAttack(side,rook,f,OpKing))
>>>> || ((AttM[f]&ctlQ) && looprichting[queen][f][OpKing]
>>>> && ScanXrayAttack(side,queen,f,OpKing)) ) {
>>>> int bb,bc,bd,be;
>>>> bb = board[t];
>>>> bc = color[t];
>>>> be = snelbord[f];
>>>> bd = snelbord[t];
>>>> snelbord[f] = 0;
>>>> snelbord[t] = 7;
>>>> board[f] = 0;
>>>> color[f] = neutral;
>>>> board[t] = 7; /* zodat schaakje niet van het stuk zelf is */
>>>> color[t] = xside;
>>>> if( SqAttackedBySide(OpKing,side) ) {
>>>> nt->zet |= move_checks;
>>>> aftrekschaak = 1;
>>>> /* #if ForwardPruning
>>>> nt->zet |= move_noprune;
>>>> #endif*/
>>>> }
>>>> board[f] = piece;
>>>> color[f] = side;
>>>> board[t] = bb;
>>>> color[t] = bc;
>>>> snelbord[f] = be;
>>>> snelbord[t] = bd;
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>><snip>
>>>
>>>Hi Vincent,
>>>nice to see some code from DIEP, interesting.
>>>
>>>I guess the declaration for looprichting is something like this:
>>>
>>>bool looprichting[12][64][64]; // 48KB or 40KB for [10]?
>>
>>int looprichting[7][64][64];
>>
>>of course only elements 1..5 get used practical for this check
>>generation so it is:
>> 5 * 4096 * 4 = 80KB
>
>It might be worth it to convert the 2 squares to 0x88 representation here and
>only need 5 * 239 * 4 = 4.78 KB ?
>
>Tony
As i said size is not the issue in DIEP here of the table. I use the
table also for other purposes and as i showed you can do with a very
small lookup table 5 * 128 elements = 640 elements. Whether something
actually attacks a square i can get from my attacktables in O(1) of
course. 0x88 is just another method to get the same answer. the few (5?)
clocks prefetch of the L2 cache for a potential check i do not mind at all.
If you lose 100k clocks a node, then just some 5 clocks prefetch which is
not actual CPU clocks which you lose as it gets prefetched, is not a major
worry of course.
It would be for crafty :)
>>I do not believe in mixing char/bool/int in a program is an easy
>>way to make a program. It needs tuning for each new generation of
>>processors. Of course it could speedup code 5-10% depending upon
>>the processor.
>>
>>But for writing good readable compatible code with not a single
>>partial register stall, obviously making everything 32 bits is
>>the best way to go.
>>
>>I can imagine that because of the major speed loss, mixing 64
>>bits code with 32 bits code in Isichess is not so dumb.
>>
>>Mixing 8 bits code + 32 bits code is something which is no good show
>>to students. It asks for suffering and trouble. the more code there is
>>the more likeliness for bugs there even.
>>
>>>I do it with bitboards in a similar way, but use an array of function pointers
>>>for each piece. For sliding pieces i use also currently a two dimensional array
>>>(bitboards which makes 32KB) indexed by "to" and "king" to get a bitboard
>>>looking for some pieces inbetween.
>>
>>This array which i use above when using 8 bits indexing is 10KB.
>>
>>>Due to mememory latency, i am thinking about a short direction fill, to
>>>determine these "inter"-bitboards. May be an array of function pointers indexed
>>>by let say (8+to-king) with dedicated functions that do directions fills with
>>>1..7 iterations without any conditional jumps.
>>>
>>>For "Aftrekschaak" i use bitboards for pinned pieces and covered
>>>(remove?)checkers, I initialized simultaniously for both sides before
>>>move-generation.
>>
>>I'm not a big fan of writing code for 2 sides. It's waste of time IMHO
>>to do that. Only when i see no other option in my evaluation i do it.
>>I do it nowhere in search though. I'm 100% sure that despite that
>>extra 'side' i reference, my code still is fast there. Fiddling with
>>bitboards and attacks there is hell slow i found out when i did some
>>tries.
>>
>>The 64 bits code at todays 32 bits processors is always suffering
>>from abnormal penalties. I was for example very
>>amazed that the huge stupid code to get a bit out of the bitboards
>>at the K7 was faster than doing the 2 vector instructions in a row.
>>
>>You clearly showed that here and i find it kind of *disgusting*
>>that such things get penalties without the average programmer
>>even *smelling* that there could be danger there.
>>
>>Optimizing for a certain processor like Dieter and Frans and many
>>others (Ed!) always use(d) to do is really a fulltime job IMHO.
>>
>>That 2 branches at the K7 which suffer for sure 1 misprediction,
>>that this still is faster than 2 vector instructions in a row
>>really convinces me that the x86 processors still have a long
>>way to go.
>>
>>I am therefore really happy with the R14000/McKinley processors.
>>
>>No weird behaviour on these things! Writing code as i do works
>>great for those processors! The R14000 has a L2 cache of 8MB and
>>the McKinley from 3 MB (L3), so obviously using such lookup arrays
>>is no problem there.
>>
>>It must be disaster to use 64 bits at pc processors today, because
>>you need to test and retest everything you do with it. Also assumption
>>of crafty inline assembly that compilers put everything is in certain
>>registers IMHO is in itself a sick assumption. Doesn't that give the
>>compilers less possibilities to optimize the code in your opinion?
>>
>>Such functions as get called here like in the above DIEP code, you
>>can do very fast in integer code with a minimum of mispredictions.
>>
>>However as you can see it is not optimized for speed that much. As i already
>>indicate, if my program would be very small and like Quest a look
>>like assembly engines had to run within L1 cache, the array sizes (80KB)
>>are already too big to consider to be useful. DIEP is using that
>>L2 data cache very intensively. Hopefully it gets efficiently
>>prefetched also at the K7 MP processor.
>>
>>In the board of course the gnuchess pawn codes
>>are there (pawn = 1, king = 6), so for pawns you need in the function
>>some quick code to determine check or not. that's a few
>>integer instructions as we all know. No need to fiddle with slow
>>64 bits there.
>>
>>>I have to look whether "from" is member of "remove checkers" bitboard and
>>>whether the move leaves the ray.
>>
>>Why do you use a lookup table for functions Gerd,
>>ain't there faster alternatives without defining a bunch of
>>functions?
>>
>>In my code i already lose of course in advance the difference between
>>white and black. Then secondly i don't need to check for the queen
>>seperately as that's bishop + rook. So all i need is like 1 'if'
>>statement extra which saves out a lookup function table.
>>
>>>Regards,
>>>Gerd
>>>
>>>
>>>// two polymorph routines
>>>Bool CNode::IsCheckMove(UINT to, UINT piece) const {
>>> return (this->*m_scIsCheckMove[piece])(to);}
>>>
>>>Bool CNode::IsCheckMove(UINT from, UINT to, UINT piece) const {
>>> return IsCheckMove(to, piece) || IsRemoveCheckMove(from, to);}
>>>
>>>....
>>>
>>>typedef Bool (CNode::*PTR_ISCHECKMOVE)(UINT to) const;
>>>
>>>CNode::PTR_ISCHECKMOVE CNode::m_scIsCheckMove[14] =
>>>{
>>> KingCheckMove,
>>> KingCheckMove,
>>> WPawnCheckMove,
>>> BPawnCheckMove,
>>> BishopCheckMove,
>>> BishopCheckMove,
>>> KnightCheckMove,
>>> KnightCheckMove,
>>> RookCheckMove,
>>> RookCheckMove,
>>> KingCheckMove,
>>> KingCheckMove,
>>> QueenCheckMove,
>>> QueenCheckMove,
>>>};
>>>
>>>Bool CNode::KingCheckMove(UINT to) const
>>>{
>>> to;
>>> ASSERT(false);
>>> return false;
>>>}
>>>
>>>
>>>Bool CNode::WPawnCheckMove(UINT to) const
>>>{
>>> UINT eksq = EnemyKingSquare();
>>> if ( isDiagonalAdjacent(eksq, to) )
>>> {
>>> register int delta = eksq - to;
>>> return (delta == 7) + (delta == 9);
>>> }
>>> if (to >= 56 )
>>> return QueenCheckMove(to);
>>> return false;
>>>}
>>>
>>>...
>>>
>>>Bool CNode::BishopCheckMove(UINT to) const
>>>{
>>> UINT eksq = EnemyKingSquare();
>>> if ( isDiagonalAdjacent(eksq, to) )
>>> return true;
>>> if ( isDiagonal(eksq, to) )
>>> return (m_sInterBB[eksq][to] & GetPieceBB()) == 0;
>>> return false;
>>>}
>>>
>>>...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.