Author: Vincent Diepeveen
Date: 18:12:25 01/12/03
Go up one level in this thread
On January 12, 2003 at 17:42:03, Bas Hamstra wrote:
>On January 12, 2003 at 17:18:17, Vincent Diepeveen wrote:
>>On January 12, 2003 at 17:14:18, Bas Hamstra wrote:
>>you need indepth processor knowledge if you try to get a concept to work fast
>>which needs a lot of work to get fast.
>
>I have a book about x86 optimation. "Inner loops" I believe it is called. It
>crushes some well accepted C-myths. It also states that with the later
>processors it becomes more and more difficult to outsmart the compilers with
>asm.
i am sure that if i would write asm that perhaps i would take weeks to beat the
compiler for a short piece of code. But i would kick its ass.
>>this whereas bitboards is all optimized very well for the concept itself.
>>you work against assembly language stuff in bitboards.
>>so you better figure out how a processor works before writing a move generator.
>>
>>the improtant problems to solve
>> - limit the number of branches
>> - arrays are allowed to use but try to minimize them
>> - do not mix 8 bits with 32 bits. perhaps faster on k6 and even k7, but
>> not always. and not at p4.
>
>>the square attacked with incremental bitboards is a simple AND.
>>see gnuchess 4.0 how to do it (not later raped versions of it).
>
>But then you imply an incrementally updated attacktable. I doubt that is going
>to break the nps record, which is what I intended to do :-) I wanted to start
>with a very fast piece-square searcher, good move sorting, checks in qsearch and
>extensive pawn strucure analysis (in pawnhash). Now I know you are going to say
>this won't get me anywhere, I KNOW it, but I just want to try for fun :-) And
>maybe I will release the source, which I intend to be max 2000 lines. Including
>UCI (400 lines), not winboard (20000+ lines).
i hope you realize that there is a big contradiction between 'extensive pawn
structure analysis', fast search, checks in qsearch, good move sorting and fast
nps.
It is not hard to make something search 10 million nodes a second. It will be
horrible inefficient though.
It is not hard to make something search deep with 2 million nodes a second
(single cpu) and just psq.
But fastest of all is and will remain of course replacing qsearch by a SEE.
Just add the SEE to eval and that's it. Very well possible with PSQ progs
Basically you want to make 'extensive pawn structure analysis'. I would call it
dumb pawn patterns, because you do only consider pawns versus pawns in your
hashtable.
what's the use of that?
You can search way faster by not doing checks in qsearch. Crafty is best example
of it.
Also no pawnevaluatoin there of course and a big lazy eval. Ed's 0.50 pawn which
is ridicioulous small margin, springs to mind.
i hope you realize that if you search at a few million nps, that getting from
main memory a cache line with the pawnstructure code is very expensive. it is
about 400 clocks or so.
Very expensive!
then you also waste system time to figuring out which checks to try. Very
expensive.
Slowing you down a lot.
i hope you see the problem.
now about storing qsearch in hashtables. i do it with diep. Can you do it with
such a fast program?
i bet not.
in fact my draughtsprogram cannot store itself last ply into hashtable either.
too expensive. It is just half a million nps a second at a 1.6Ghz K7!
when you want to get fast, you are only busy with tradeoffs. this is the main
problem. From fritz' speed we can see however that investing time in figuring
out which checks to do and good move ordering has to work very well. it's very
fast searching, but despite very great written assembly it's not that much
faster than crafty, yet it outsearches it lightyears.
Best regards,
Vincent
>Best regards,
>Bas.
>
>>>I am playing with a 0x88 implementation, that is supposed to be faster than my
>>>rotated BitBoard program. Just for fun. However I am a little disappointed about
>>>the speed gains so far. I have learned in the past that there are a couple of
>>>very speed critical functions:
>>>
>>>- make/unmake
>>>- gencaptures
>>>- squareattacked
>>>- SEE
>>>
>>>So far my 0x88 make/unmake is about the same speed as in my BB program Tao. The
>>>second function I tested was bool SquareAttacked(int Sq, int ByColor). I was a
>>>little disappointed to see it is more than 2x slower than my rotated BitBoard
>>>version.
>>> rot BB 0x88
>>>Makes/Unmakes per sec: 4237288 4.25M
>>>GenCaptures per sec: 1915709 ?
>>>GenCapturesChecks per sec: 1569859 ?
>>>SquareAttacked per sec: 18181818 8M !!!
>>>SwapValue[c1e3] per sec: 6756757 ?
>>>GetBishopAttacks[26] per sec: 45454545 ?
>>>
>>>Below the 0x88 function used for SquareAttacked, which for the moment is the
>>>fastest I can think of:
>>>
>>>bool TBoard::SquareAttacked(int To, int ByColor)
>>>
>>>{ int n,
>>> D,
>>> From,
>>> Type,
>>> Sq,
>>> Count = PieceCount[ByColor];
>>>
>>> if(ByColor==WHITE && Bord[To-17] == 13) return true;
>>> if(ByColor==WHITE && Bord[To-15] == 13) return true;
>>> if(ByColor==BLACK && Bord[To+17] == 12) return true;
>>> if(ByColor==BLACK && Bord[To+15] == 12) return true;
>>>
>>> for(n=0; n<Count; n++)
>>
>>why use all this slow code for something that should be a single AND?
>>
>>> { From=PieceList[ByColor][n];
>>> Type=Bord[From]>>1;
>>> if(Type==PAWN) break;
>>> if(PseudoAtt[127+To-From] & (1<<Type) )
>>> { if(Type==KING || Type==KNIGHT) return true;
>>> D = Dir[127+To-From];
>>> for(Sq=From+D; Sq!=To; Sq+=D)
>>> if(Bord[Sq]) break;
>>> if(Sq==To) return true;
>>> }
>>> }
>>>
>>> return false;
>>>}
>>>
>>>All in all a little disappointing so far. Are the tiny loops *really* so
>>>amazingly fast?
>>>
>>>Best regards,
>>>Bas.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.