Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty SMP questions.

Author: Robert Hyatt

Date: 17:03:30 08/05/03

Go up one level in this thread


On August 05, 2003 at 19:08:21, Matthew Hull wrote:

>On August 05, 2003 at 18:38:51, Robert Hyatt wrote:
>
>>On August 05, 2003 at 17:30:51, Matthew Hull wrote:
>>
>>>Does Crafty scale above 4 processors?  For example, could crafty utilize all the
>>>CPUs on an IBM pSeries 690 7040-681T 32-way?
>>>
>>>http://www-132.ibm.com/content/home/store_IBMPublicUSA/en_US/eServer/pSeries/high_end/690.html
>>>
>>>MH
>>
>>
>>I didn't have time to look closely.  But it is most likely a NUMA platform,
>>which means that Crafty as it exists now is not going to work well on it.  NUMA
>>machines require careful attention to what is put where in memory, so that
>>often-used data is as close to the physical processor (in terms of access
>>latency) as possible.  The current implementation of SMP in Crafty is based
>>on pure SMP, where memory is simply shared.
>>
>>If I ever have time to fiddle with a NUMA machine, I'll probably look at
>>fixing the major issue, which is to put split blocks close to each processor,
>>and when giving a specific processor a tree to search, using a split block that
>>is _close_ to it.
>
>Would the n-way speedups for NUMA be less than with a pure SMP?  I.e. you would
>not see the 1.7-1.9 you get now on Xeons, but something less perhaps.  I
>remember on IBM mainframes, there was a point of diminishing returns after about
>10-way with 12-way being the limit.  But that was a while back.  Now they have
>16-way machines that scale well on the Cheryl Watson benchmarks, but I don't
>know if they are pure SMP or not.  I suppose NUMA does not have that problem
>(just different problems as you mentioned) since I see massively parrallel
>machines mentioned here abouts.

In general, NUMA machines are not necessarily slower than non-NUMA machines.
The problem is, the programmer has to be very careful in where things get stuck
in memory, so that they are laid out as optimally as possible, with respect to
which processor needs to access which stuff in memory and how frequently it is
accessed.

Crafty doesn't have that tweak at the moment, which will make it perform very
badly.  Critical data structures are allocated as a big array of structures,
which guarantees that they will all exist on one processor's local memory, and
drive the others insane trying to access it quickly/frequently.



>
>MH



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.