Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A faster move generator than previously known

Author: Vincent Diepeveen

Date: 15:19:04 08/08/03

Go up one level in this thread


On August 08, 2003 at 16:15:26, Gerd Isenberg wrote:

Gerd, not calling the code sensational is the biggest compliment you can give.
Using very simple instructions is the key to victory always at getting fast at
processors. Normal instructions will always be very very fast. ADD AND XOR and
so on. Those instructions get executed by parts of the processor which run them
for sure very fast in any order.

I would rather be surprised if in your kogge-stone experiments in the slow SSE2
registers, you would find an order of instructions that executes quickly.

In that respect i have to mention an important note on my way of programming
software in general. I am very well aware that if you move the integer
generator to unsigned and mix with unsigned chars that you
can speed it up a lot. Quite a lot. I wouldn't be surprised under runtime
conditions about 10% actually thanks to huge L2 cache savings.

This move generator didn't start like it is now. When i wrote it,
it was a 8 bits move generator in fact. Only the move itself was
a 32 bits integer. Each year i was more dissappointed when i continuesly found
out that all that casting triggered bugs in my code. Of course i was the problem
of those bugs. Forgetting which compiler followed which ANSI-C convention and
forgetting ansi-c conventions about casting anyway. Something Dieter would never
do for example.

Dieter is the type of programmer who really seeks the risk there IMHO. With his
very consequent way of not forgetting those conventions and what happens where
at what point i am sure he will manage.

But i won't. I try to prevent all crisis in my code. I try to prevent wasting
time on thinking how i might speed up by mixing types everywhere in my program i
can speed it up at a certain processor a few %. Even though at old processors
the speedup is huge.

Trivially at itanium2 and R14000 and P4 this won't be the case soon.

Where you might love inline assembly and all other kind of toying
with instructions, i am preferring to keep clean code and program
without possible casting bugs therefore.

Now i am sure you can keep out of all the shit trouble i had with
casting bugs in the past, but i simply prefer a neat programming style
using 'int' everywhere, above toying with unsigned and uchar issues
in the code.

That's why it is all integer code and will remain so.

Very interesting is your idea you point down at the end.

Yes i have toyed a lot at other places in my code rewriting code such
that cmov* instructions were triggered by the compiler. I was horrified by how
slow they were, though i must admit that i have no clue how fast they are at the
opteron and p4.

When i tested the code it was not faster at the time. P3 and K7. However if i
can save out a branch, then it sure is worth the effort here. I'll have a look
at that.

>On August 07, 2003 at 23:48:51, Vincent Diepeveen wrote:
>
>>You guys can figure out the rest i bet seeing this code.
>>
>>All that bitboard idiocy always. This kicks the hell out of it.
>>
>>Vincent Diepeveen,
><snip>
>
>Vincent,
>
>after a first look to your Move generator - i see nothing special or
>sensational. A simple and clean straight foreward approach, without any kind of
>move ordering considering target squares i guess. An outer loop over all members
>of a piecelist, calling inlined GenMovesI for a square of each piece.
>
>In GenMovesI after getting the piece from board, there is first a pawn case and
>second a none pawn case, i guess one or two miss predictions per piecelist.
>
>In pawn case you have one outer if !promotion else promotion
>
>Btw. if your row function is unsigned, it may cheaper to ask:
>
>    if( (row(u)-1) < (7-1) )
>
>In the none promotion case you have three or four further ifs.
>The double push combined with boolean and.
>The promotion part has obviously three additional ifs.
>
>In the piece case you distinguish between sliding pieces and none sliding
>pieces. One case where missprediction is likely, even if your piecelist is
>sorted in ascending piece order.
>
>In both cases you have a do-while loop with one if (color[u] != side) inside.
>
>I understand that the critical inner branches are most often predicted correctly
>with color[u]. But anyway i don't like it in inner loops - even correct
>predicted branches take some time due to cmp and jxx instructions.
>And obviously you loop over all targets, occupied by own pawns and pieces.
>Simply not necessary work for pieces defending some pawns and other pieces.
>And the additional inner "if" may blow up your loop bodies about some cache line
>boundaries...
>
>I learned, that it is better to put loop invariant conditions (not the case
>here, but a kind of due to correct prediction) outside the loop and to keep the
>loop bodies brancheless.
>
>Ok, i see the problem with bitboard traversal several piece and target subsets
>with special exclusive source/target properties via bitscan. They are most often
>low populated or empty and the while condition fails quite soon.
>But after bitscanAndReset one can already calculate the while condition before
>the loop body occurs, which may also be helpfull to predict correctly postbody.
>
>Have you ever tried conditional write?
>
>      if( color[u] == xside ) {
>        rb->zetend->zet = zet;
>        rb->zetend++;
>
>      rb->zetend->zet = zet;
>      rb->zetend += ( color[u] == xside );
>
>
>Regards,
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.