Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: How to make your movegen 4x slower in 1 easy step

Author: Vincent Diepeveen
Date: 05:59:52 08/31/01
On August 31, 2001 at 01:58:32, Bruce Moreland wrote:

>On August 30, 2001 at 20:45:07, Vincent Diepeveen wrote:
>
>>On August 30, 2001 at 13:56:53, Scott Gasch wrote:
>>
>>Some years ago i was faced with the same problems as you
>>face now with.
>>
>>Without doubt the best solution for windows is what Andrew Dados
>>suggests, the global thread variables.
>
>I looked into this.  Actually, I looked into this in your house, while you were
>asleep, after you suggested this while we were sitting in that computer shop all
>day long.

>Thread-based variables in MSVC suck.  Whenever you access one of them, it does
>very ugly looking stuff, which can't possibly be better than passing a pointer
>all over everywhere.  The things use a segment register (fs).  That has to be
>terrible.

>Needless to say, I handle this by passing a pointer all over everywhere, as in
>Gerbil.

Aha so my solution to do multiprocessing was even smarter as i thought!

>bruce
>
>>
>>One small problem is that your program only works for windows then,
>>and monsoon no longer works then under linux.
>>
>>You can slow down your program, bob is always a fan of that, as overhead
>>doesn't need to be 400% of course. Bob estimates it at i think 10% or so?
>>
>>But by far the simplest solution to get rid of all these problems is
>>to get multiprocessing.
>>
>>Whatever way you search parallel, multiprocessing is faster if you
>>want to avoid all the tough global thread variable definitions!
>>
>>Also at the superb dual AMD SMP chipset there is no longer any disadvantage
>>in multiprocessing.
>>
>>For BSD it even has more advantages, as bsd can only do multiprocessing,
>>i heart multithreading might deliver problems under bsd at a multiprocessor
>>machine.
>>
>>My tip go for a 0% overhead, and 0% problem thing and go multiprocessing.
>>
>>whether you multithread or singlethread, that hashtable you need to share
>>anyway, so who cares?
>>
>>
>>
>>
>>
>>>I'm moving around data structures in my engine to consolodate things that are
>>>going to be needed on a per-cpu basis if/when I go parallel.
>>>
>>>One such structure is my move stack.  It's a big array of moves with a start and
>>>end index per ply.  So for example it might look like this:
>>>
>>>start[0] = 0  ...  end[0] = 32  [array entries 0..32 hold moves at ply 0]
>>>start[1] = 33 ...  end[1] = 60  [array entries 33..60 hold moves at ply 1]
>>>...
>>>
>>>Well if more than one thread is searching at once I will need more than one move
>>>stack and more than one ply counter.  So I kept the same move stack struct and
>>>made g_MoveStack an array:
>>>
>>>MOVE_STACK g_MoveStack[NUM_CPU];
>>>
>>>The code to access the move stack goes from this:
>>>
>>>g_MoveStack.iStart[g_iPly] = 0;
>>>
>>>to this:
>>>
>>>g_MoveStack[iCpuId].iStart[g_iPly[iCpuId]] = 0;
>>>
>>>Talk about a huge impact -- move move generator benchmark literally is 4x
>>>slower!  These dereferences are damn expensive.  There has to be a better way,
>>>can one of you assembly gurus give me a clue?
>>>
>>>Here is a solution I am thinking about -- have a struct per-thread that houses
>>>the ply and a pointer to the start of the right move stack entry.  Then do
>>>something like this:
>>>
>>>THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]);
>>>(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0;
>>>
>>>I bet this is just as slow though... Any advice?
>>>
>>>Thanks,
>>>Scott
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.