Author: Bruce Moreland
Date: 21:58:29 08/31/01
Go up one level in this thread
On August 30, 2001 at 13:56:53, Scott Gasch wrote: >I'm moving around data structures in my engine to consolodate things that are >going to be needed on a per-cpu basis if/when I go parallel. > >One such structure is my move stack. It's a big array of moves with a start and >end index per ply. So for example it might look like this: > >start[0] = 0 ... end[0] = 32 [array entries 0..32 hold moves at ply 0] >start[1] = 33 ... end[1] = 60 [array entries 33..60 hold moves at ply 1] >... > >Well if more than one thread is searching at once I will need more than one move >stack and more than one ply counter. So I kept the same move stack struct and >made g_MoveStack an array: > >MOVE_STACK g_MoveStack[NUM_CPU]; > >The code to access the move stack goes from this: > >g_MoveStack.iStart[g_iPly] = 0; > >to this: > >g_MoveStack[iCpuId].iStart[g_iPly[iCpuId]] = 0; > >Talk about a huge impact -- move move generator benchmark literally is 4x >slower! These dereferences are damn expensive. There has to be a better way, >can one of you assembly gurus give me a clue? > >Here is a solution I am thinking about -- have a struct per-thread that houses >the ply and a pointer to the start of the right move stack entry. Then do >something like this: > >THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]); >(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0; > >I bet this is just as slow though... Any advice? > >Thanks, >Scott You are doing this in a goofus way. I wish I could be authoritative and tell you how to do this, but I might be doing this in a goofus way, too. When I decided to make Ferret parallel, I didn't have a multi-processor machine yet, so I decided to do this same work first. I had a bunch of global variables, and the first thing I did was to stick them all in a big struct. Then I passed a pointer to the struct to *everything*. From reading Bob's post, that's what he's doing, too. When you become parallel, you just make more of these structs. You end up with overhead due to passing around these pointers, and you have overhead due to indirecting it. The overhead is not zero but it's not enormous, either. The reason I say that what you are doing is goofus is that you have a zillion weird-sized arrays you have to deal with, and dealing with the array indexes must be death. If you want to see what I did, check out Gerbil. I believe that it would be very easy to get two searches going simultaneously in Gerbil, which is like 1/3 of the way to being parallel. http://www.seanet.com/~brucemo/gerbil/gerbil.htm It's passing around a pointer to a global struct, too, even though it's not parallel and will probably never be. Someone else suggested using a funky MSVC thread based variable, but as I said in a reply to Vincent, that seems to do some ugly indirection that is probably worse than passing the pointer around. A couple of other things: >start[0] = 0 ... end[0] = 32 [array entries 0..32 hold moves at ply 0] >start[1] = 33 ... end[1] = 60 [array entries 33..60 hold moves at ply 1] C for-loops are usually done like this: for (i = 0; i < N; i++) "<" is used mostly, rather than "<=". With this in mind, you'd have a 33 in end[0]. Which makes it equal to start[1]. So you can get rid of end[], and just use start[N]..start[N+1] to delimit your stuff, which is something someone else pointed out. >THREAD_INFO *pThreadInfo = &(g_ThreadInfo[iCurrentThreadId]); >(pThreadInfo->pMoveStack)->iStart[sThreadInfo->iCpuId] = 0; You don't need any of the () in the above. "&" already has low precedence and precedence doesn't matter with "->". The cases that cause trouble are where you use "*" and "->" in the same reference, and in those cases sometimes you need to use (). bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.