Author: Don Dailey
Date: 16:46:40 01/23/98
Go up one level in this thread
>I would not say "trivial"... :) I wrote Cilkchess first as a serial program and got it debugged (ok, it turned out it was not so well debugged!) It took me about an hour to port it to Cilk. However keep in mind I knew it would be a parallel program. >>Aske Plaat ported an old version of Crafty to Cilk. Since Crafty >>was designed to be a serial program he had some problems with all >>the global state. But even this did not prevent him from porting >>it in a few days (even though he was completely unfamiliar with the >>code.) If anyone decides to write a parallel program I highly >>recommend encapsulating all your position state (the board and >>relevant state) into a structure. If you are concerned that >>creating new states for each node will be too expensive, don't >>be. Modern processors copy contiguous data extremely rapidly >>and you avoid some work with keeping unmake information (you never >>unmake a move.) The same code (a big benefit of using cilk) will >>produce a serial version of the program that runs very fast and I >>promise it will be competitive with your current serial program. >>(I think I hear someone in the background screaming something about >>data cache but I can't quite make out what they're saying.) > >you need to run on a PC. I once did this (copy state) and the best I >could do was 5K nodes per sec on a P5/133... when I got rid of the >state >copy and started make/unmake, I eventually got past 30K. The PC only >has a sustainable bandwidth of 60-70mb/second. Compared to 32 gigabytes >per second on a Cray. Or maybe a gig/sec on your CM-5. You will get >killed copying large structures on a PC however... This sounds way off to me but I don't know you tried to store. You are claiming more than 6 to 1 difference for copying state. This directly implies I would need about 8 processors to equal the performance of one serial processor with make/unmake. This means I would be twice as fast if I had NOT run on a 4 SMP's and went back to make/unmake. When I started using this technique (to facilitate parallel programming) I was extremely concerned about this issue. I pretty much resigned myself to biting the bullet and accepting a big slowdown for the greater benefit. But when I actually had a finished product and looked at the numbers (and compared to my previous serial program with make/unmake) I was flabbergasted. I could not tell the difference. And this was on the intel platform using linux. We had all these hacks in to reduce the state size too, we were using character variables for the board and so on. But we later figured out all of this crap was not neccessary, a few dozen bytes of state copied per move is nothing. We took out the hacks and it sped up slightly! I have no idea how to resolve this with your experience of 6 to 1 since making moves is less than 5% of our overhead and they include the state copy. But you said something about copying large structures on a PC. I am not copying large structures (there is no need.) Just the board itself and a few variables. But even if someone thought this might be a big problem, it's trivial to solve. at any node near the end nodes drop the parallelism (our quies is not parallel) and update the state in place with make/unmake. Even though our quies is not parallel I still do the state copies simply because it's no slower than make/unmake. One very nice side benefit of the position structure (other than never having to unmake a move which itself is a minor speedup) is that your code becomes much cleaner. - Don hurt things much at all) and update the state in
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.