Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why multiprocessing is hell faster than multithreading

Author: Robert Hyatt

Date: 07:06:24 09/01/01

Go up one level in this thread


On September 01, 2001 at 06:59:01, Vincent Diepeveen wrote:

>On August 31, 2001 at 10:18:13, Robert Hyatt wrote:
>
>>On August 31, 2001 at 09:03:28, Vincent Diepeveen wrote:
>>
>>>On August 31, 2001 at 00:20:28, Robert Hyatt wrote:
>>>
>>>>On August 30, 2001 at 20:45:07, Vincent Diepeveen wrote:
>>>>
>>>>>On August 30, 2001 at 13:56:53, Scott Gasch wrote:
>[snip]
>>>You can do that only if it is a volatile variable. In multiprocessing
>>>you also can use shared global variables without problems. In fact
>>>that's what i'm doing.
>>
>>They had better be volatile no matter which way you do the parallel search,
>>or it is broken.
>
>Exactly.
>
>>
>>>>doing, the overhead is significant.  Thousands of instructions per message,
>>>
>>>I am nowhere using message passing in my engine.
>>>
>>>>which means you can't split the tree where you might one one processor to
>>>>search a single node.  Threads don't have that overhead.  They have exactly
>>>>none.
>>>
>>>I have no idea how you think i implemented this, but i'm simply sharing
>>>the tree datastructure.
>>>
>>>Of course i need to approach that using a pointer, but that's exactly
>>>how you do it.
>>
>>Then your point would be?  You claim your approach is "better" yet you have
>>to pass a pointer around, which is exactly what I am doing.  And this is
>>somehow "better"??
>
>Because everything which is nonsearch is not needing an extra pointer.
>
>In short if i evaluate a pattern
>
>int EvalRooksInEndgameBecauseOfA(void) {
>  int s=0;
>  if( board[sq_a2] == rook && board[sq_a3] == pawn )
>    s += blabla;
>  return s;
>}
>
>In crafty this would be
>int EvalRooksInEndgameBecauseOfA(boarddatastructureforthread *b) {
>  int s=0;
>  if( b->board[sq_a2] == rook && b->board[sq_a3] == pawn )
>    s += blabla;
>  return s;
>}

First, that is wrong.  I don't evaluate quite like that.  But regardless,
there are two issues here.  I need an extra pointer.  When I started the
SMP stuff in version 15.0, the first question I asked was "what does that
pointer cost me?"  The answer was "less than 5% in the overall speed of
the program."  That is not a lot.  Your "hell fast" is way misleading,
because you lose big-time in the EGTB stuff.  Far more than my 5% in fact.

You notice that you are the _only_ person that chose to do the totally
separate process approach.  Ever wonder exactly why the rest of us chose
to use threads?  Hint:  we aren't stupid.  :)




>
>I do not need the SLOW indirection 'b->' pointer!
>
>Where you can share data i can do that too using shared memory (EGTBs too).
>So where we in search both use an extra pointer, in evaluation and
>everything that is using in fact data which is for each thread different,
>there i can use global data and you need an extra pointer.
>
>Dissassemble and get horrified what your pointers need for number of
>extra (unnecessary) clocks!


<5% I can stand.  That is _all_ it is costing.  Just compare version 14.13
to 15.0 to see the speed difference.


>
>>
>>
>>>
>>>The evaluation (slowest part of my program) i can use global arrays
>>>without using a slow pointer which also needs to get passed to every
>>>single function i use, which would also increase program size considerably
>>>in all respects.
>>
>>I don't see how that makes it "better".  It does make it "bigger" since all
>>your code and non-shared data is duplicated in memory N times, once for each
>>process you start.
>
>That doesn't matter at all Bob if you keep into mind how L2 cache works,
>it only delivers to the cpu what it asks for, it has nothing to do
>with what the other L2 cache is storing for a different search process!


Size does matter.  When you use 4 cpus, you are wasting almost 100megabytes
with egtb decompression indices.  The rest of us waste 25 megabytes.  If you
eventually go to 6-man files, you will be wasting gigabytes of memory.

If I really thought separate processes was faster, I would have done it that
way.  It is _easier_ to do.  But I knew it wasn't the best way, overall, and
so I went to the trouble to pass the pointer around...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.