Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Intel four-way 2.8 Ghz system is just Amazing ! - Not hardly

Author: Robert Hyatt

Date: 16:22:45 11/12/03

Go up one level in this thread


On November 12, 2003 at 19:07:32, Russell Reagan wrote:

>On November 12, 2003 at 18:09:31, Robert Hyatt wrote:
>
>>You have to jump through some hoops, and also hope that the O/S helps.  On
>>windows, Eugene is using the set processor affinity mechanism to lock a
>>thread to a particular CPU.
>
>Does this have to be done manually or is there an API call to do this (Windows
>or Linux)?

In windows there is an API call to do this.  I have not studied the "NUMA"
linux kernel stuff at all, so I don't know if there is a way to do this or
not.  At the moment, for "normal" kernels, there is not, although the process
scheduler understands the idea that if a process is run on a particular CPU,
it should be run there whenever possible because of the contents of cache
that are already loaded and ready to use.


>
>
>>The idea is that you have a CPU connected to a "router" which is
>>connected to the local memory for that CPU, plus the router is
>>connected to other routers for other processors.  Your local router can
>>access your local memory and give it to you quickly.  To access memory on
>>other processors requires that you ask your router for the memory, and it
>>has to ask a router it can reach to either give it the value or forward the
>>request on to a router that is closer, until the request finally arrives
>>at the router connected to the local memory.  It's all those "hops" that
>>kill performance.  So you just have to understand that shortcoming of the
>>NUMA architecture and work around it.  The up-side is that it is very
>>hard to scale an SMP box beyond 4 cpus.  Intel did it with their FUSION
>>chipset a couple of years back, but their machine looks like two 4-way
>>boxes coupled with a kludge, and it doesn't perform very well as memory
>>is still 4-way interleaved, but with 8 processors demanding data.  The
>>NUMA approach scales better, cost-wise, but there is a performance issue
>>that must be addressed.
>
>Ah, just like networking :)

Yes.  If you are doing distributed computing on a cluster, your number one
priority is to limit communication since it is slow.  NUMA is nowhere near
that slow, but it is a measurable quantity that can make big differences.  Did
you see the old vs new numbers for Crafty on the quad opteron?  That's the
kind of performance differential we are talking about here.  It can be minor,
or in my case it can be very significant.
>
>I get it now. Thanks for taking the time to explain it and answer my questions.


That's what I do.

:)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.