Author: Vasik Rajlich
Date: 06:14:26 01/20/06
Go up one level in this thread
On January 19, 2006 at 22:10:05, Mridul Muralidharan wrote: >On January 19, 2006 at 16:45:37, Tord Romstad wrote: > >>On January 19, 2006 at 15:02:26, Dann Corbit wrote: >> >>>On January 19, 2006 at 03:48:34, Tord Romstad wrote: >>> >>>>Why? Wouldn't a chess engine using multiple processes and >>>>shared memory be just as fast as a similar program with >>>>multiple threads on a computer with several CPUs (or CPUs >>>>with multiple cores)? I thought the engine using processes >>>>would just consume a bit more memory, and not have any >>>>significant disadvantages apart from that. >>>> >>>>I'm not saying that you are wrong, of course. I am totally >>>>ignorant about this subject, and I ask in order to learn more. >>> >>>Process startup time, verses thread startup time can be significant. >> >>Yes, of course, but I don't see why you would want to start new >>processes except during program initialisation. It seems to me >>that the obvious thing to do would be to start all processes at >>once during program initialisation, and keep them running >>throughout the game. When there is no work to do, a process >>would simply return to some sort of idle loop. >> >>>An executable consumes more computer resources than a thread. >>> >>>It's not a stupendous difference. If you look at the efficiency of Deep Sjeng >>>(which uses processes) it's really not that bad. The place where it would hurt >>>the most is in the extremes (e.g. 8 CPUs with 4 cores each). >> >>I still haven't decided what I will use. Threads don't really >>look that much harder to me, but perhaps I'm missing something. >>I'm sure there are lots of difficulties I won't understand before >>I try. >> >>Tord > > >Hi Tord, all, > > First off , potential boring blurb ... in which case , please ignore :) > > In general for chess , my personal preference is to use processes than threads >from KISS perspective. I am going to ignore the contentious issues of >performance , cache locality issues , etc debates here and just advocate based >on this one point ;-) > > From my expierence , having very well defined and explict points of >interaction between the processes through the shared memory and mutexes really >really helps. >More like having clearly defined interfaces (through data though) - there are >sufficiently high number of race conditions already possible (esp initially when >you start off) without introducing additional possibilities if you are trying >out DTS , etc :-) > >It could be argued that parallelisation can be effeciently done with threads and >it can be - but you do incur a more stack based approach of shuffling >intermeadiate variables around and need to be very very careful. > >Example : >In eval , typically I have an attack_table[2][64] initialised at begining (now I >have increamental attacktable - but idea is the same I would guess) and use it >everywhere else , ditto for other forms of info like attack_patterns , >king_safety , pawn_defects , etc. >Simialrly , my pawn tables , eval hash tables , etc should also need to be >lockless or assuming more complex/larger info storage in the tables - gated. > >If you are using threads , either you would use a stack based approach and pass >these around , or you will put them in a 'this_proc' and use it from there (so >an extra indirection for nearly everything : this_proc->attack_table[] , etc) > >One possible downside is that egtb probing will be incurring higher penality >(from what I remember - I could be wrong here) since you will not be using the >advantage of the caching done by Eugene Nalimov across the varios processes ... >this will be per process. >Maybe someday this will be shared memory and remove that penality ! > >Ofcourse , at the end of the day , it is just how you think about the problem >and your solutions to tackle them - there are no HUGE drawbacks of either >approaches unless they are specific to some platforms. >And the problems you solve IMO are pretty similar in both approaches. >Amd dont forget - we should have fun solving these problems :) so use an >approach which suits you. > >Would be great to see a parallel Gothmog or Glaurung version out there ! >And dont worry , there is an unjust perception that writing parallel engine with >good speedup is tough - it is not really that tough from my expierences :) > >Regards, >Mridul > >PS : >Simplest way to kickstart a parallel engine ? >1) Get/borrow a parallel box - atleast a dual core. > >2) Design a parallel search on paper which can scale upto say 16 or 32 (forget >that you might never run on it anytime soon) - it is better to have something >which can potentially scale higher than having to rework code after a couple of >years. (zappa , diep , sjeng , crafty , etc) > >3) The way I started 3 weeks ago was to write basic make/unmake move , movegen , >etc and a parallel perft and make this work. Then build on top of it. >This can be easily validated for correctness and yet should give you nearly >linear perfect 'time to solution' scaling. > >4) Lots of assertions - I have '#ifdef MP_DEBUG' which valiadates all my locks >are freed/acquired in most of the relevent portions , shared memory state , etc >: this was a huge validation that my search was not messing up. > >5) Have lot of coffee - especially when debugging at night :) > >Now my average speedup is 1.89x for time to depth for the positions that Vincent >posted (Thanks Daniel !) with worst case being 1.3 and hopefully improving (on >my dual core athlon box). Mridul, this 1.89 speedup, you got this by splitting off a new process (rather than thread)? I would be interested to hear more about this. Ie. when the process is started and (very roughly) what you pass to it. A "full threaded parallel search" like in Crafty seems like a good chunk of work, not to mention a pain in the rear end to debug. Vas
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.