Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: How to make your movegen 4x slower in 1 easy step

Author: Bruce Moreland
Date: 21:58:29 08/31/01
On August 30, 2001 at 13:56:53, Scott Gasch wrote:

>I'm moving around data structures in my engine to consolodate things that are
>going to be needed on a per-cpu basis if/when I go parallel.
>
>One such structure is my move stack.  It's a big array of moves with a start and
>end index per ply.  So for example it might look like this:
>
>start[0] = 0  ...  end[0] = 32  [array entries 0..32 hold moves at ply 0]
>start[1] = 33 ...  end[1] = 60  [array entries 33..60 hold moves at ply 1]
>...
>
>Well if more than one thread is searching at once I will need more than one move
>stack and more than one ply counter.  So I kept the same move stack struct and
>made g_MoveStack an array:
>
>MOVE_STACK g_MoveStack[NUM_CPU];
>
>The code to access the move stack goes from this:
>
>g_MoveStack.iStart[g_iPly] = 0;
>
>to this:
>
>g_MoveStack[iCpuId].iStart[g_iPly[iCpuId]] = 0;
>
>Talk about a huge impact -- move move generator benchmark literally is 4x
>slower!  These dereferences are damn expensive.  There has to be a better way,
>can one of you assembly gurus give me a clue?
>
>Here is a solution I am thinking about -- have a struct per-thread that houses
>the ply and a pointer to the start of the right move stack entry.  Then do
>something like this:
>
>THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]);
>(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0;
>
>I bet this is just as slow though... Any advice?
>
>Thanks,
>Scott

You are doing this in a goofus way.

I wish I could be authoritative and tell you how to do this, but I might be
doing this in a goofus way, too.

When I decided to make Ferret parallel, I didn't have a multi-processor machine
yet, so I decided to do this same work first.

I had a bunch of global variables, and the first thing I did was to stick them
all in a big struct.

Then I passed a pointer to the struct to *everything*.  From reading Bob's post,
that's what he's doing, too.

When you become parallel, you just make more of these structs.

You end up with overhead due to passing around these pointers, and you have
overhead due to indirecting it.  The overhead is not zero but it's not enormous,
either.

The reason I say that what you are doing is goofus is that you have a zillion
weird-sized arrays you have to deal with, and dealing with the array indexes
must be death.

If you want to see what I did, check out Gerbil.  I believe that it would be
very easy to get two searches going simultaneously in Gerbil, which is like 1/3
of the way to being parallel.

http://www.seanet.com/~brucemo/gerbil/gerbil.htm

It's passing around a pointer to a global struct, too, even though it's not
parallel and will probably never be.

Someone else suggested using a funky MSVC thread based variable, but as I said
in a reply to Vincent, that seems to do some ugly indirection that is probably
worse than passing the pointer around.

A couple of other things:

>start[0] = 0  ...  end[0] = 32  [array entries 0..32 hold moves at ply 0]
>start[1] = 33 ...  end[1] = 60  [array entries 33..60 hold moves at ply 1]

C for-loops are usually done like this:

  for (i = 0; i < N; i++)

"<" is used mostly, rather than "<=".  With this in mind, you'd have a 33 in
end[0].  Which makes it equal to start[1].  So you can get rid of end[], and
just use start[N]..start[N+1] to delimit your stuff, which is something someone
else pointed out.

>THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]);
>(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0;

You don't need any of the () in the above.  "&" already has low precedence and
precedence doesn't matter with "->".  The cases that cause trouble are where you
use "*" and "->" in the same reference, and in those cases sometimes you need to
use ().

bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.