Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: How to make your movegen 4x slower in 1 easy step

Author: Robert Hyatt

Date: 21:20:28 08/30/01

On August 30, 2001 at 20:45:07, Vincent Diepeveen wrote:

>On August 30, 2001 at 13:56:53, Scott Gasch wrote:
>
>Some years ago i was faced with the same problems as you
>face now with.
>
>Without doubt the best solution for windows is what Andrew Dados
>suggests, the global thread variables.
>
>One small problem is that your program only works for windows then,
>and monsoon no longer works then under linux.
>
>You can slow down your program, bob is always a fan of that, as overhead
>doesn't need to be 400% of course. Bob estimates it at i think 10% or so?
>
>But by far the simplest solution to get rid of all these problems is
>to get multiprocessing.
>
>Whatever way you search parallel, multiprocessing is faster if you
>want to avoid all the tough global thread variable definitions!
>
>Also at the superb dual AMD SMP chipset there is no longer any disadvantage
>in multiprocessing.
>
>For BSD it even has more advantages, as bsd can only do multiprocessing,
>i heart multithreading might deliver problems under bsd at a multiprocessor
>machine.
>
>My tip go for a 0% overhead, and 0% problem thing and go multiprocessing.
>
>whether you multithread or singlethread, that hashtable you need to share
>anyway, so who cares?
>
>
>
>
>


The problem is granularity.  With threads, I can store a global variable
and have all threads see it instantly.  For message passing, as you are
doing, the overhead is significant.  Thousands of instructions per message,
which means you can't split the tree where you might one one processor to
search a single node.  Threads don't have that overhead.  They have exactly
none.





>>I'm moving around data structures in my engine to consolodate things that are
>>going to be needed on a per-cpu basis if/when I go parallel.
>>
>>One such structure is my move stack.  It's a big array of moves with a start and
>>end index per ply.  So for example it might look like this:
>>
>>start[0] = 0  ...  end[0] = 32  [array entries 0..32 hold moves at ply 0]
>>start[1] = 33 ...  end[1] = 60  [array entries 33..60 hold moves at ply 1]
>>...
>>
>>Well if more than one thread is searching at once I will need more than one move
>>stack and more than one ply counter.  So I kept the same move stack struct and
>>made g_MoveStack an array:
>>
>>MOVE_STACK g_MoveStack[NUM_CPU];
>>
>>The code to access the move stack goes from this:
>>
>>g_MoveStack.iStart[g_iPly] = 0;
>>
>>to this:
>>
>>g_MoveStack[iCpuId].iStart[g_iPly[iCpuId]] = 0;
>>
>>Talk about a huge impact -- move move generator benchmark literally is 4x
>>slower!  These dereferences are damn expensive.  There has to be a better way,
>>can one of you assembly gurus give me a clue?
>>
>>Here is a solution I am thinking about -- have a struct per-thread that houses
>>the ply and a pointer to the start of the right move stack entry.  Then do
>>something like this:
>>
>>THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]);
>>(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0;
>>
>>I bet this is just as slow though... Any advice?
>>
>>Thanks,
>>Scott

Re: How to make your movegen 4x slower in 1 easy step Vincent Diepeveen 06:03:28 08/31/01
- Re: How to make your movegen 4x slower in 1 easy step Robert Hyatt 07:18:13 08/31/01
  - Why multiprocessing is hell faster than multithreading Vincent Diepeveen 03:59:01 09/01/01
    - Re: Why multiprocessing is hell faster than multithreading Robert Hyatt 07:06:24 09/01/01
  - Re: How to make your movegen 4x slower in 1 easy step (more) Robert Hyatt 07:21:02 08/31/01

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.