Computer Chess Club Archives

Search

Terms
Messages

Subject: Debugging a parallel search (LONG)

Author: Vincent Diepeveen
Date: 16:10:08 06/17/05
On June 17, 2005 at 01:52:45, Daniel Shawul wrote:

>On June 16, 2005 at 18:01:58, Vincent Diepeveen wrote:
>
>>On June 15, 2005 at 16:36:54, Eric Oldre wrote:
>>
>>Of course the more time you invest in debugging the better.
>>
>
>Hi
>i have tried to output the tree to disk,when i tried parallel
>tree searching using PV split method. Thread1 outputs to file1,and
>Thread2 to file2. But the output produced is unreadable.
>you might have already explanined how to do this, but i couldn't get it.
>best
>daniel

For parallel search what really debugged all my bugs is next :

at startup i allocate 1 big shared memory block which is a huge array of
integers. additional i have 1 global volatile lock in the shared memory.

each processor is doing next in search. If they hit at a line that writes debug
info, then i'm doing basically this :

In the code you will find for example this :

      #if DEBUGNODESVERBOSE
      dverb_info[0] = (int)rb->idnumber;
      dverb_info[1] = rb->alfa;
      dverb_info[2] = rb->beta;
      dverb_info[3] = rb->ply; // depthleft
      DVERB(4,dverb_abigetblock);
      #endif

Now the function DVERB is doing this :
void DVERB(int ninfo,int dverbflag) {
  int header,i;
  /*  header byte:
   *  32 bits waarvan:
   *    1   sign   (unused)
   *   17   laatste debugsmp punt (indien aanwezig)
   *   10   verbosepoint:    dverb_...
   *    4   processornummer (max 0..15)
   */
  //printf("locking dverb ninfo=%i flag=%i\n",ninfo,dverbflag);
  Lock(tree->lock_dva);
  header = 0;
  //#endif
  header |= (dverbflag<<4)|ProcessNumber;
  dva[tree->dva_count++] = header;
  i = 0;
  while( i < ninfo ) {
    dva[tree->dva_count++] = dverb_info[i];
    i++;
  }

  if( tree->dva_count >= DVASIZE-64 ) {
    int tdr;

    printf("het probleem is duidelijk!\n");
    tdr = tree->dva_rond+1;
  //  printf("Passed DVASIZE (%i bytes) already %i
times!\n",sizeof(int)*tree->dva_count,tdr);
    if( nprocesses == 1 && false ) {
      //printf("gonna save it now\n");
      Unlock(tree->lock_dva);
      SMPdebugSAVE();
      InitVerbose();
      Lock(tree->lock_dva);
    }
    else {
      int m,move;
      //mprint("processes = %i rounds = %i\n",tree->nprocesses,tdr);
      InitVerbose();
      /* nu vastleggen het volgende: de zetstack van de verschillende processors
*/
      dva[tree->dva_count++] = (dverb_movestacks<<4);
      for( i = 0; i < nprocesses; i++ ) {
        dva[tree->dva_count++] = 5;
        m = 0;
        //mprint("stored line in DVA: %i ... ",tree->movesmadeindex[i]);
        while( m < 5 ) {
          move = m;
          mprint("%s%s ",M2A[(move>>6)&63],M2A[move&63]);
          dva[tree->dva_count++] = move;
          m++;
        }
        //mprint("\n");
      }
    }
    tree->dva_rond = tdr;
  }
  Unlock(tree->lock_dva);
}

So basically lock, store the search info as integers in the array and unlock.
When it has crashed, all you have to do is write the shared memory to disk and
then parse the shared memory until you found the processor that crashed and you
figure out where it was and what it was doing and the state at which other
processors were at that moment and so on.

And then pray that you crash very quickly. Basically when this testbed works
fine all you have got to do is find the quickest way to crash or get a deadlock
or race condition in your program. Then debug backwards.

That's not so easy and requires a lot of experience :)

But you can get very complex parallel forms of search to work with this method.
Because basically every processor directly locks and unlocks, the odds are not
so big that this 'debug code' in shared memory will eat all system time.

Your evaluation will, your move ordering will and so on.

It's just a sequential fill of shared memory.

Then additional write tools to find relevant data within that huge file.

Ways to really quickly crash your program is important to find too. A good one
i'll give for free, that's use the utmost stupid way to split, like splitting
always and especially as much near the leaves as possible. And start searching
with 16 cpu's :)

That really crashes the program quickly :)

Now the way in which i print out things. To give a real example of 16 processors
(note my printing routines aren't even good working for it) :

0...............  SplitInOthers: processnumber=0 newproc=8 idnumberposition=1236
0...............  SplitInOthers: processnumber=0 newproc=15 idnumberposition=123
6
0...............  SplitInOthers: processnumber=0 newproc=1 idnumberposition=1236
..........10.....  Splitting position id=1236
..........10.....  Splitting position: found and moved to position 1236
.......7........  Splitting position id=1236
.........9......  Splitting position id=1236
.........9......  Splitting position: found and moved to position 1236
.......7........  Splitting position: found and moved to position 1236
.....5..........  Splitting position id=1236
.....5..........  Splitting position: found and moved to position 1236
..2.............  Splitting position id=1236
..2.............  Splitting position: found and moved to position 1236
...........11....  Splitting position id=1236
...........11....  Splitting position: found and moved to position 1236
........8.......  Splitting position id=1236
........8.......  Splitting position: found and moved to position 1236
............12...  Splitting position id=1236
............12...  Splitting position: found and moved to position 1236
...3............  Splitting position id=1236
...3............  Splitting position: found and moved to position 1236
......6.........  Splitting position id=1236
......6.........  Splitting position: found and moved to position 1236
....4...........  Splitting position id=1236
.............13..  Splitting position id=1236
..............14.  Splitting position id=1236
....4...........  Splitting position: found and moved to position 1236
...............15  Splitting position id=1236
.1..............  Splitting position id=1236
..............14.  Splitting position: found and moved to position 1236
...............15  Splitting position: found and moved to position 1236
.............13..  Splitting position: found and moved to position 1236
.1..............  Splitting position: found and moved to position 1236
............12...  Abortme() id=1236
............12...  splitid=0 depth=5 procs=[0,6,10,3,2,-1,9,13,4,11,5,7,14,8,15,
1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
............12...  PartBoth: proc just returned to idle searching 0 nodes
............12...  ActionCheck() flags=1 realply=0
.....5..........  Abortme() id=1236
.....5..........  splitid=0 depth=5 procs=[0,6,10,3,2,-1,9,13,4,11,-1,7,14,8,15,
1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
.....5..........  PartBoth: proc just returned to idle searching 0 nodes
.....5..........  ActionCheck() flags=1 realply=0
..............14.  Abortme() id=1236
..............14.  splitid=0 depth=5 procs=[0,6,10,3,2,-1,9,13,4,11,-1,7,-1,8,15
,1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
..............14.  PartBoth: proc just returned to idle searching 0 nodes
..............14.  ActionCheck() flags=1 realply=0
...3............  Abortme() id=1236
...3............  splitid=0 depth=5 procs=[0,6,10,-1,2,-1,9,13,4,11,-1,7,-1,8,15
,1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
...3............  PartBoth: proc just returned to idle searching 0 nodes
...3............  ActionCheck() flags=1 realply=0
......6.........  Abortme() id=1236
......6.........  splitid=0 depth=5 procs=[0,-1,10,-1,2,-1,9,13,4,11,-1,7,-1,8,1
5,1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
......6.........  PartBoth: proc just returned to idle searching 0 nodes
......6.........  ActionCheck() flags=1 realply=0
.............13..  Abortme() id=1236
.............13..  splitid=0 depth=5 procs=[0,-1,10,-1,2,-1,9,-1,4,11,-1,7,-1,8,
15,1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
.............13..  PartBoth: proc just returned to idle searching 0 nodes
.............13..  ActionCheck() flags=1 realply=0
....4...........  Abortme() id=1236
....4...........  splitid=0 depth=5 procs=[0,-1,10,-1,2,-1,9,-1,-1,11,-1,7,-1,8,
15,1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
....4...........  PartBoth: proc just returned to idle searching 0 nodes
....4...........  ActionCheck() flags=1 realply=0
.1..............  Abortme() id=1236
.1..............  splitid=0 depth=5 procs=[0,-1,10,-1,2,-1,9,-1,-1,11,-1,7,-1,8,
15,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
.1..............  PartBoth: proc just returned to idle searching 0 nodes
.1..............  ActionCheck() flags=1 realply=0
...............15  Abortme() id=1236
...............15  splitid=0 depth=5 procs=[0,-1,10,-1,2,-1,9,-1,-1,11,-1,7,-1,8
,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
...............15  PartBoth: proc just returned to idle searching 0 nodes
...............15  ActionCheck() flags=1 realply=0
0...............  Abortme() id=1236
0...............  splitid=0 depth=5 procs=[-1,-1,10,-1,2,-1,9,-1,-1,11,-1,7,-1,8
,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
0...............  PartBoth: proc just returned to idle searching 0 nodes
0...............  ActionCheck() flags=1 realply=0
..........10.....  Abortme() id=1236
..........10.....  splitid=0 depth=5 procs=[-1,-1,-1,-1,2,-1,9,-1,-1,11,-1,7,-1,
8,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
..........10.....  PartBoth: proc just returned to idle searching 0 nodes
..........10.....  ActionCheck() flags=1 realply=0
..2.............  Abortme() id=1236
..2.............  splitid=0 depth=5 procs=[-1,-1,-1,-1,-1,-1,9,-1,-1,11,-1,7,-1,
8,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
..2.............  ActionCheck() flags=1 realply=0
.......7........  Abortme() id=1236
.......7........  splitid=0 depth=5 procs=[-1,-1,-1,-1,-1,-1,9,-1,-1,11,-1,-1,-1
,8,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
.......7........  PartBoth: proc just returned to idle searching 0 nodes
.......7........  ActionCheck() flags=1 realply=0
........8.......  Abortme() id=1236
........8.......  splitid=0 depth=5 procs=[-1,-1,-1,-1,-1,-1,9,-1,-1,11,-1,-1,-1
,-1,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
........8.......  PartBoth: proc just returned to idle searching 0 nodes
........8.......  ActionCheck() flags=1 realply=0
...........11....  Abortme() id=1236
...........11....  splitid=0 depth=5 procs=[-1,-1,-1,-1,-1,-1,9,-1,-1,-1,-1,-1,-
1,-1,-1,-1] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
...........11....  PartBoth: proc just returned to idle searching 0 nodes
...........11....  ActionCheck() flags=1 realply=0
.........9......  MakeSplittable() id=1233 realply=3
.........9......  registeridlesplits owner proc=9
.........9......  splitid=0 depth=3 procs=[11,8,7,2,10,0,9,15,1,4,13,6,3,14,5,12
] daughters=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
.........9......  SplitInOthers: processnumber=9 newproc=11 idnumberposition=123
3
and so on

As you see this output is just printing all actions about parallel search, not
even printing info regarding the actual searched tree (as that would be hundreds
of megabytes).

The huge advantage of writing the debug info to shared memory is that it eats so
little time that the processors get more time into their splitting routines and
get therefore bigger odds to crash if there is a bug.

>>Apart from boundschecking with a real boundschecker; gcc has a great
>>boundschecker extension at their homepage, windows has nothing that impressed me
>>in fact like devpartner doesn't even find the most obvious boundsbug in my GUI.
>>Further there is normal debugging ways, the most effective way in which i debug
>>Diep if there is problems is in the next trivial way:
>>
>>That simply printing the entire search tree to disk. That can be done either
>>direct or indirect :
>>
>>example output (written by hand just to show example now for 4 processors) :
>>
>>0... searching root
>>0...  makemove a2a3
>>0... a2a3 makemove d7d5
>>0... a2a3 d7d5 eval -0.200
>>0... a2a3 d7d5 ab=[0.001;0.002] cutoff -0.200
>>...
>>
>>And when processor 1 is doing an action :
>>.1.. blabla
>>
>>In short this is a very powerful way to debug search or debug parallel bugs.
>>Of course for tough race conditions in a parallel search it is not so clever to
>>directly print to textfile, as the time spent to print is too long (which means
>>a deadlock or race conditions will happen not so soon as the time spent to
>>printing is too huge).
>>
>>So i print it in a shared block of memory (locking first of course) and later
>>parse that memory block to the above text.
>>
>>In this way i managed to debug the parallel search of Diep.
>>
>>For single cpu just printing directly to disk will do fine and you can examine
>>how big your trees get at small search depths up to a ply or 6 is no problem.
>>
>>You'll find many weird stuff in your qsearch and your move ordering like this.
>>
>>Of course it's time consuming to do the above (reading those huge textfiles) but
>>it's very rewarding.
>>
>>The above has another huge advantage and that it is finding bugs in your
>>evaluation function and directly gives you an answer whether a new compiler
>>sucks, is dangerous to use or whether you have a so far unfound bug which
>>boundscheckers didn't find.
>>
>>That is, provided you also print the evaluation score in the textfile.
>>
>>You can compile at different platforms with different type of compilers your
>>code and just compare the textfile verbose. If it is not 100% similar, then
>>obviously there is a bug somewhere.
>>
>>This really will remove about all nasty to find bugs in your entire
>>search&evaluation function.
>>
>>It is the by far the best test to see whether a new compiler has bugs.
>>
>>Especially many intel compilers and certain gcc compilers did not pass this test
>>in past when using -O3 or in case of intel just -O2. gcc 3.3.x and 3.4.x was
>>real real buggy here. 4.0 is far better.
>>
>>Of course the fact that you *seriously* look to debugging methods in your engine
>>already means that your thing is going to be more bugfree than majority of
>>amateur programs. Most haven't heard in fact from the word 'boundschecker'.
>>
>>The fact that out of any random compiler with this trick i can find
>>optimizations that produce bugs, tells me how poor the different compiler teams
>>are in debugging their compilers.
>>
>>Diep is not using any difficult casting construction or so, i never take risks
>>there. Additional it is 100% integer code, so no floating point tricks are
>>possible for compilers.
>>
>>Especially the GCC team is the most horrible code writing team on planet earth.
>>
>>Like 99% of their releases suffers so much from bugs that i can't even compile
>>with -O3 without catching bugs.
>>
>>Obviously that i can prove that with such a simple debugging method is telling
>>really a lot about how powerful the verbose printing method of a search is.
>>
>>>I was wondering if anyone would like to volunteer any tricks they've
>>>used to help find certain bugs in their search function.
>>
>>>I think that the ideas of using perft for move generation and
>>>reversing the board to find bugs in the evaluation have both
>>>been really useful to me. I was wondering if anyone has used
>>>techniques similar to these to help find search bugs.
>>>
>>>I understand that just because a engine can properly pass these
>>>and other tests doesn't mean it's free of bugs. But they certainly
>>>help detect at least some of them.
>>>
>>>I'm certain that there must be plenty of bugs in Latista's search
>>>and I think it's time for me to work on discovering them. If
>>>you don't have any automated tricks like above. Does anyone
>>>have any general advise to help me spot some bugs?
>>>
>>>Eric Oldre
>>>
>>>PS. I have at various spots in my program tried to follow a similar
>>>model of asserts as in fruit. I'm sure taking some time to
>>>do this at more parts of my program would help.
Thanks! n/t Daniel Shawul 22:39:56 06/17/05
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.