Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: checks in qsearch

Author: Gerd Isenberg

Date: 04:39:24 10/30/02

Go up one level in this thread


<snip>
>int looprichting[7][64][64];
>
>of course only elements 1..5 get used practical for this check
>generation so it is:
>  5 * 4096 * 4 = 80KB
>

May be an bit-array is a more cache friendly alternative. Also Tony's proposal
with 0x88 looks promising.


>I do not believe in mixing char/bool/int in a program is an easy
>way to make a program. It needs tuning for each new generation of
>processors. Of course it could speedup code 5-10% depending upon
>the processor.
>
>But for writing good readable compatible code with not a single
>partial register stall, obviously making everything 32 bits is
>the best way to go.
>
>I can imagine that because of the major speed loss, mixing 64
>bits code with 32 bits code in Isichess is not so dumb.
>
>Mixing 8 bits code + 32 bits code is something which is no good show
>to students. It asks for suffering and trouble. the more code there is
>the more likeliness for bugs there even.
>
>>I do it with bitboards in a similar way, but use an array of function pointers
>>for each piece. For sliding pieces i use also currently a two dimensional array
>>(bitboards which makes 32KB) indexed by "to" and "king" to get a bitboard
>>looking for some pieces inbetween.
>
>This array which i use above when using 8 bits indexing is 10KB.
>
>>Due to mememory latency, i am thinking about a short direction fill, to
>>determine these "inter"-bitboards. May be an array of function pointers indexed
>>by let say (8+to-king) with dedicated functions that do directions fills with
>>1..7 iterations without any conditional jumps.
>>
>>For "Aftrekschaak" i use bitboards for pinned pieces and covered
>>(remove?)checkers, I initialized simultaniously for both sides before
>>move-generation.
>
>I'm not a big fan of writing code for 2 sides. It's waste of time IMHO
>to do that. Only when i see no other option in my evaluation i do it.
>I do it nowhere in search though. I'm 100% sure that despite that
>extra 'side' i reference, my code still is fast there. Fiddling with
>bitboards and attacks there is hell slow i found out when i did some
>tries.
>
>The 64 bits code at todays 32 bits processors is always suffering
>from abnormal penalties. I was for example very
>amazed that the huge stupid code to get a bit out of the bitboards
>at the K7 was faster than doing the 2 vector instructions in a row.
>
>You clearly showed that here and i find it kind of *disgusting*
>that such things get penalties without the average programmer
>even *smelling* that there could be danger there.
>
>Optimizing for a certain processor like Dieter and Frans and many
>others (Ed!) always use(d) to do is really a fulltime job IMHO.
>
>That 2 branches at the K7 which suffer for sure 1 misprediction,
>that this still is faster than 2 vector instructions in a row
>really convinces me that the x86 processors still have a long
>way to go.
>
>I am therefore really happy with the R14000/McKinley processors.
>
>No weird behaviour on these things! Writing code as i do works
>great for those processors! The R14000 has a L2 cache of 8MB and
>the McKinley from 3 MB (L3), so obviously using such lookup arrays
>is no problem there.
>
>It must be disaster to use 64 bits at pc processors today, because
>you need to test and retest everything you do with it. Also assumption
>of crafty inline assembly that compilers put everything is in certain
>registers IMHO is in itself a sick assumption. Doesn't that give the
>compilers less possibilities to optimize the code in your opinion?
>
>Such functions as get called here like in the above DIEP code, you
>can do very fast in integer code with a minimum of mispredictions.
>
>However as you can see it is not optimized for speed that much. As i already
>indicate, if my program would be very small and like Quest a look
>like assembly engines had to run within L1 cache, the array sizes (80KB)
>are already too big to consider to be useful. DIEP is using that
>L2 data cache very intensively. Hopefully it gets efficiently
>prefetched also at the K7 MP processor.
>
>In the board of course the gnuchess pawn codes
>are there (pawn = 1, king = 6), so for pawns you need in the function
>some quick code to determine check or not. that's a few
>integer instructions as we all know. No need to fiddle with slow
>64 bits there.
>
>>I have to look whether "from" is member of "remove checkers" bitboard and
>>whether the move leaves the ray.
>
>Why do you use a lookup table for functions Gerd,
>ain't there faster alternatives without defining a bunch of
>functions?

It's simply an alternative to the "switch" statement. It should be inlined to
have not more call overhead than a call to a single function.

__forceinline
Bool CNode::IsCheckMove(UINT to, UINT piece) const	{
   return (this->*m_scIsCheckMove[piece])(to);}

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.