Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Neverending story with incomplete tablebases

Author: Robert Hyatt

Date: 09:13:34 08/19/03

On August 18, 2003 at 17:11:12, Russell Reagan wrote:

>On August 18, 2003 at 16:24:09, Sune Fischer wrote:
>
>>If I cared about simplicity I would not be going parallel, I want performance!
>>:)
>
>Well if you're going for "real" parallel, that's great. If you're only trying to
>do "kind of" parallel, then I don't think the millisecond it costs to start up a
>couple of threads before a search is going to hurt performance that much.
>
>I think we're talking about different things though. I'm only talking about
>starting up threads for each root move, so maybe 35-40 per iteration, and never
>more than 2 or 4 at a time. If your approach means that you would have to be
>constantly starting threads for each node, then your approach is better of
>course :)
>

The drawback is such an approach starts off with a very bad performance
potential.  Do a search and compare the time needed to search the _first_
root move with the time to search all the other root moves.  The first move
generally takes more time than all the other moves combined.  If you do this
in parallel, using four threads, then you will have four "first moves to
search" and they will take the same long time.  The speedup will be in the
1.1-1.3X range, and never get better except for rare cases.

That's why more sophisticated splitting approaches are used in other
programs...

>
>>Yes this is what I need, do you think I want to be stuck in windows forever? :)
>
>Well, if you only need those simple operations, then it would probably help
>performance. That would mean non-portability via Windows functions or assembler.
>It probably wouldn't be too bad to write a version for Windows (or just use the
>Interlocked* functions) and then write a handful of assembler routines for
>linux.
>
>Mutexes/etc. aren't terribly slow in my experience, and will be tuned for
>efficiency, but there is some overhead that can't really be avoided. I imagine
>there would be hardly any measurable overhead using the Interlocked* functions
>or the assembly that they convert to (cmov or whatever).

The problem is that mutexes _block_ when a lock is already held by another
process.  That turns into a _huge_ delay.  The process has to be blocked and
before the blocking operation is completed, the lock has already been
released by the other thread.  Then the process has to be unblocked and
scheduled.  That's unbearable.  If you lock for long periods of time, it is
better to use mutex locks, but in chess, the lock duration is generally for
only a few nanoseconds at most.  Mutex locks will kill performance there.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.