Author: Robert Hyatt
Date: 09:13:34 08/19/03
Go up one level in this thread
On August 18, 2003 at 17:11:12, Russell Reagan wrote: >On August 18, 2003 at 16:24:09, Sune Fischer wrote: > >>If I cared about simplicity I would not be going parallel, I want performance! >>:) > >Well if you're going for "real" parallel, that's great. If you're only trying to >do "kind of" parallel, then I don't think the millisecond it costs to start up a >couple of threads before a search is going to hurt performance that much. > >I think we're talking about different things though. I'm only talking about >starting up threads for each root move, so maybe 35-40 per iteration, and never >more than 2 or 4 at a time. If your approach means that you would have to be >constantly starting threads for each node, then your approach is better of >course :) > The drawback is such an approach starts off with a very bad performance potential. Do a search and compare the time needed to search the _first_ root move with the time to search all the other root moves. The first move generally takes more time than all the other moves combined. If you do this in parallel, using four threads, then you will have four "first moves to search" and they will take the same long time. The speedup will be in the 1.1-1.3X range, and never get better except for rare cases. That's why more sophisticated splitting approaches are used in other programs... > >>Yes this is what I need, do you think I want to be stuck in windows forever? :) > >Well, if you only need those simple operations, then it would probably help >performance. That would mean non-portability via Windows functions or assembler. >It probably wouldn't be too bad to write a version for Windows (or just use the >Interlocked* functions) and then write a handful of assembler routines for >linux. > >Mutexes/etc. aren't terribly slow in my experience, and will be tuned for >efficiency, but there is some overhead that can't really be avoided. I imagine >there would be hardly any measurable overhead using the Interlocked* functions >or the assembly that they convert to (cmov or whatever). The problem is that mutexes _block_ when a lock is already held by another process. That turns into a _huge_ delay. The process has to be blocked and before the blocking operation is completed, the lock has already been released by the other thread. Then the process has to be unblocked and scheduled. That's unbearable. If you lock for long periods of time, it is better to use mutex locks, but in chess, the lock duration is generally for only a few nanoseconds at most. Mutex locks will kill performance there.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.