Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to make your movegen 4x slower in 1 easy step

Author: Robert Hyatt

Date: 07:18:13 08/31/01

Go up one level in this thread


On August 31, 2001 at 09:03:28, Vincent Diepeveen wrote:

>On August 31, 2001 at 00:20:28, Robert Hyatt wrote:
>
>>On August 30, 2001 at 20:45:07, Vincent Diepeveen wrote:
>>
>>>On August 30, 2001 at 13:56:53, Scott Gasch wrote:
>>>
>>>Some years ago i was faced with the same problems as you
>>>face now with.
>>>
>>>Without doubt the best solution for windows is what Andrew Dados
>>>suggests, the global thread variables.
>>>
>>>One small problem is that your program only works for windows then,
>>>and monsoon no longer works then under linux.
>>>
>>>You can slow down your program, bob is always a fan of that, as overhead
>>>doesn't need to be 400% of course. Bob estimates it at i think 10% or so?
>>>
>>>But by far the simplest solution to get rid of all these problems is
>>>to get multiprocessing.
>>>
>>>Whatever way you search parallel, multiprocessing is faster if you
>>>want to avoid all the tough global thread variable definitions!
>>>
>>>Also at the superb dual AMD SMP chipset there is no longer any disadvantage
>>>in multiprocessing.
>>>
>>>For BSD it even has more advantages, as bsd can only do multiprocessing,
>>>i heart multithreading might deliver problems under bsd at a multiprocessor
>>>machine.
>>>
>>>My tip go for a 0% overhead, and 0% problem thing and go multiprocessing.
>>>
>>>whether you multithread or singlethread, that hashtable you need to share
>>>anyway, so who cares?
>>>
>>>
>>>
>>>
>>>
>>
>>
>>The problem is granularity.  With threads, I can store a global variable
>>and have all threads see it instantly.  For message passing, as you are
>
>You can do that only if it is a volatile variable. In multiprocessing
>you also can use shared global variables without problems. In fact
>that's what i'm doing.

They had better be volatile no matter which way you do the parallel search,
or it is broken.


>
>>doing, the overhead is significant.  Thousands of instructions per message,
>
>I am nowhere using message passing in my engine.
>
>>which means you can't split the tree where you might one one processor to
>>search a single node.  Threads don't have that overhead.  They have exactly
>>none.
>
>I have no idea how you think i implemented this, but i'm simply sharing
>the tree datastructure.
>
>Of course i need to approach that using a pointer, but that's exactly
>how you do it.

Then your point would be?  You claim your approach is "better" yet you have
to pass a pointer around, which is exactly what I am doing.  And this is
somehow "better"??



>
>The evaluation (slowest part of my program) i can use global arrays
>without using a slow pointer which also needs to get passed to every
>single function i use, which would also increase program size considerably
>in all respects.

I don't see how that makes it "better".  It does make it "bigger" since all
your code and non-shared data is duplicated in memory N times, once for each
process you start.


>
>>
>>
>>
>>
>>>>I'm moving around data structures in my engine to consolodate things that are
>>>>going to be needed on a per-cpu basis if/when I go parallel.
>>>>
>>>>One such structure is my move stack.  It's a big array of moves with a start and
>>>>end index per ply.  So for example it might look like this:
>>>>
>>>>start[0] = 0  ...  end[0] = 32  [array entries 0..32 hold moves at ply 0]
>>>>start[1] = 33 ...  end[1] = 60  [array entries 33..60 hold moves at ply 1]
>>>>...
>>>>
>>>>Well if more than one thread is searching at once I will need more than one move
>>>>stack and more than one ply counter.  So I kept the same move stack struct and
>>>>made g_MoveStack an array:
>>>>
>>>>MOVE_STACK g_MoveStack[NUM_CPU];
>>>>
>>>>The code to access the move stack goes from this:
>>>>
>>>>g_MoveStack.iStart[g_iPly] = 0;
>>>>
>>>>to this:
>>>>
>>>>g_MoveStack[iCpuId].iStart[g_iPly[iCpuId]] = 0;
>>>>
>>>>Talk about a huge impact -- move move generator benchmark literally is 4x
>>>>slower!  These dereferences are damn expensive.  There has to be a better way,
>>>>can one of you assembly gurus give me a clue?
>>>>
>>>>Here is a solution I am thinking about -- have a struct per-thread that houses
>>>>the ply and a pointer to the start of the right move stack entry.  Then do
>>>>something like this:
>>>>
>>>>THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]);
>>>>(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0;
>>>>
>>>>I bet this is just as slow though... Any advice?
>>>>
>>>>Thanks,
>>>>Scott



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.