Author: Mridul Muralidharan
Date: 19:10:05 01/19/06
Go up one level in this thread
On January 19, 2006 at 16:45:37, Tord Romstad wrote: >On January 19, 2006 at 15:02:26, Dann Corbit wrote: > >>On January 19, 2006 at 03:48:34, Tord Romstad wrote: >> >>>Why? Wouldn't a chess engine using multiple processes and >>>shared memory be just as fast as a similar program with >>>multiple threads on a computer with several CPUs (or CPUs >>>with multiple cores)? I thought the engine using processes >>>would just consume a bit more memory, and not have any >>>significant disadvantages apart from that. >>> >>>I'm not saying that you are wrong, of course. I am totally >>>ignorant about this subject, and I ask in order to learn more. >> >>Process startup time, verses thread startup time can be significant. > >Yes, of course, but I don't see why you would want to start new >processes except during program initialisation. It seems to me >that the obvious thing to do would be to start all processes at >once during program initialisation, and keep them running >throughout the game. When there is no work to do, a process >would simply return to some sort of idle loop. > >>An executable consumes more computer resources than a thread. >> >>It's not a stupendous difference. If you look at the efficiency of Deep Sjeng >>(which uses processes) it's really not that bad. The place where it would hurt >>the most is in the extremes (e.g. 8 CPUs with 4 cores each). > >I still haven't decided what I will use. Threads don't really >look that much harder to me, but perhaps I'm missing something. >I'm sure there are lots of difficulties I won't understand before >I try. > >Tord Hi Tord, all, First off , potential boring blurb ... in which case , please ignore :) In general for chess , my personal preference is to use processes than threads from KISS perspective. I am going to ignore the contentious issues of performance , cache locality issues , etc debates here and just advocate based on this one point ;-) From my expierence , having very well defined and explict points of interaction between the processes through the shared memory and mutexes really really helps. More like having clearly defined interfaces (through data though) - there are sufficiently high number of race conditions already possible (esp initially when you start off) without introducing additional possibilities if you are trying out DTS , etc :-) It could be argued that parallelisation can be effeciently done with threads and it can be - but you do incur a more stack based approach of shuffling intermeadiate variables around and need to be very very careful. Example : In eval , typically I have an attack_table[2][64] initialised at begining (now I have increamental attacktable - but idea is the same I would guess) and use it everywhere else , ditto for other forms of info like attack_patterns , king_safety , pawn_defects , etc. Simialrly , my pawn tables , eval hash tables , etc should also need to be lockless or assuming more complex/larger info storage in the tables - gated. If you are using threads , either you would use a stack based approach and pass these around , or you will put them in a 'this_proc' and use it from there (so an extra indirection for nearly everything : this_proc->attack_table[] , etc) One possible downside is that egtb probing will be incurring higher penality (from what I remember - I could be wrong here) since you will not be using the advantage of the caching done by Eugene Nalimov across the varios processes ... this will be per process. Maybe someday this will be shared memory and remove that penality ! Ofcourse , at the end of the day , it is just how you think about the problem and your solutions to tackle them - there are no HUGE drawbacks of either approaches unless they are specific to some platforms. And the problems you solve IMO are pretty similar in both approaches. Amd dont forget - we should have fun solving these problems :) so use an approach which suits you. Would be great to see a parallel Gothmog or Glaurung version out there ! And dont worry , there is an unjust perception that writing parallel engine with good speedup is tough - it is not really that tough from my expierences :) Regards, Mridul PS : Simplest way to kickstart a parallel engine ? 1) Get/borrow a parallel box - atleast a dual core. 2) Design a parallel search on paper which can scale upto say 16 or 32 (forget that you might never run on it anytime soon) - it is better to have something which can potentially scale higher than having to rework code after a couple of years. (zappa , diep , sjeng , crafty , etc) 3) The way I started 3 weeks ago was to write basic make/unmake move , movegen , etc and a parallel perft and make this work. Then build on top of it. This can be easily validated for correctness and yet should give you nearly linear perfect 'time to solution' scaling. 4) Lots of assertions - I have '#ifdef MP_DEBUG' which valiadates all my locks are freed/acquired in most of the relevent portions , shared memory state , etc : this was a huge validation that my search was not messing up. 5) Have lot of coffee - especially when debugging at night :) Now my average speedup is 1.89x for time to depth for the positions that Vincent posted (Thanks Daniel !) with worst case being 1.3 and hopefully improving (on my dual core athlon box).
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.