Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to make your movegen 4x slower in 1 easy step

Author: Bruce Moreland

Date: 22:58:32 08/30/01

Go up one level in this thread


On August 30, 2001 at 20:45:07, Vincent Diepeveen wrote:

>On August 30, 2001 at 13:56:53, Scott Gasch wrote:
>
>Some years ago i was faced with the same problems as you
>face now with.
>
>Without doubt the best solution for windows is what Andrew Dados
>suggests, the global thread variables.

I looked into this.  Actually, I looked into this in your house, while you were
asleep, after you suggested this while we were sitting in that computer shop all
day long.

Thread-based variables in MSVC suck.  Whenever you access one of them, it does
very ugly looking stuff, which can't possibly be better than passing a pointer
all over everywhere.  The things use a segment register (fs).  That has to be
terrible.

Needless to say, I handle this by passing a pointer all over everywhere, as in
Gerbil.

bruce

>
>One small problem is that your program only works for windows then,
>and monsoon no longer works then under linux.
>
>You can slow down your program, bob is always a fan of that, as overhead
>doesn't need to be 400% of course. Bob estimates it at i think 10% or so?
>
>But by far the simplest solution to get rid of all these problems is
>to get multiprocessing.
>
>Whatever way you search parallel, multiprocessing is faster if you
>want to avoid all the tough global thread variable definitions!
>
>Also at the superb dual AMD SMP chipset there is no longer any disadvantage
>in multiprocessing.
>
>For BSD it even has more advantages, as bsd can only do multiprocessing,
>i heart multithreading might deliver problems under bsd at a multiprocessor
>machine.
>
>My tip go for a 0% overhead, and 0% problem thing and go multiprocessing.
>
>whether you multithread or singlethread, that hashtable you need to share
>anyway, so who cares?
>
>
>
>
>
>>I'm moving around data structures in my engine to consolodate things that are
>>going to be needed on a per-cpu basis if/when I go parallel.
>>
>>One such structure is my move stack.  It's a big array of moves with a start and
>>end index per ply.  So for example it might look like this:
>>
>>start[0] = 0  ...  end[0] = 32  [array entries 0..32 hold moves at ply 0]
>>start[1] = 33 ...  end[1] = 60  [array entries 33..60 hold moves at ply 1]
>>...
>>
>>Well if more than one thread is searching at once I will need more than one move
>>stack and more than one ply counter.  So I kept the same move stack struct and
>>made g_MoveStack an array:
>>
>>MOVE_STACK g_MoveStack[NUM_CPU];
>>
>>The code to access the move stack goes from this:
>>
>>g_MoveStack.iStart[g_iPly] = 0;
>>
>>to this:
>>
>>g_MoveStack[iCpuId].iStart[g_iPly[iCpuId]] = 0;
>>
>>Talk about a huge impact -- move move generator benchmark literally is 4x
>>slower!  These dereferences are damn expensive.  There has to be a better way,
>>can one of you assembly gurus give me a clue?
>>
>>Here is a solution I am thinking about -- have a struct per-thread that houses
>>the ply and a pointer to the start of the right move stack entry.  Then do
>>something like this:
>>
>>THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]);
>>(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0;
>>
>>I bet this is just as slow though... Any advice?
>>
>>Thanks,
>>Scott



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.