Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some thoughts for those who are considering to buy a Dual processor PC

Author: Robert Hyatt

Date: 13:09:10 03/27/01

Go up one level in this thread


On March 27, 2001 at 15:59:22, Vincent Diepeveen wrote:

>On March 27, 2001 at 13:53:13, Robert Hyatt wrote:
>
>>On March 27, 2001 at 13:34:12, Vincent Diepeveen wrote:
>>
>>>On March 27, 2001 at 09:36:54, Robert Hyatt wrote:
>>>
>>>>On March 27, 2001 at 08:52:15, Andrew Dados wrote:
>>>>
>>>>>On March 27, 2001 at 08:17:43, Vincent Diepeveen wrote:
>>>>>
>>>>>>On March 26, 2001 at 22:44:53, Robert Hyatt wrote:
>>>>>>
>>>>>>>On March 26, 2001 at 20:54:48, Dan Andersson wrote:
>>>>>>>
>>>>>>>>I have to agree that multibanked memory and lage cashe size are very beneficial.
>>>>>>>>Those factors could very well explain the superlinearity.
>>>>>>>>
>>>>>>>>Regards Dan Andersson
>>>>>>>
>>>>>>>
>>>>>>>I don't think the 4-way interleaving helps.  That _barely_ lets the machine
>>>>>>>hold its own, memory-wise, because there are 4x as many cpus fighting over
>>>>>>>access to memory... making it nearly 4x faster just barely breaks even.  The
>>>>>>>larger L2 cache may well make a difference, of course...
>>>>>>
>>>>>>You speak for Crafty. I speak for DIEP.
>>>>>>I'm doing 8 probes of at least 16 bytes an entry.
>>>>>>that's 128 bytes.
>>>>>
>>>>>My guess is big difference for SMP DIEP is running separate processes instead of
>>>>>threads. 8 probes is just icing on that cake.
>>>>>
>>>>>For what I know windows (unix too) will load each process to its own address
>>>>>space, so they will fight for L3 cashe.
>>>>>Or am I totally wrong here?
>>>>>
>>>>>-Andrew-
>>>>>
>>>>
>>>>This is correct.  Threads share one address space so this isn't such a huge
>>>>problem...
>>>
>>>but your program with threads is slower as you need extra pointers
>>>everywhere except if you start using non-ansi C standards.
>>>
>>>How do i evaluate a board position in ansi-C using multithreading
>>>without needing to load an extra pointer?
>>>
>>>
>>>
>>
>>
>>This is a moot issue.  I mentioned before that when I first converted to the
>>pointer approach, I was expecting a huge performance hit.  In reality it was
>>less than 7% and over time that dropped to under 5%.  I would bet that that
>>5% is swamped by the advantage of having one large common virtual address
>>space which prevents continual cache flushes...
>
>But that 7% would get quite some more if you would have more patterns
>that require the board. It's for sure even more if you would be non-bitboard,
>like i am, as then it needs to reload and reload that pointer everywhere
>and in complex expressions the optimization of it will suffer too.

I have _one_ pointer to the per-process context.  That takes up one register.
Since that pointer is used so frequently everywhere, it tends to stick in a
register, which is ok...

As far as whether the pointer is bad or not, my evaluation accesses the chess
board representations thousands of times for a given node.  Just the pawn
scoring evaluation alone looks at the board so many times it is ridiculous...


>
>For diep i would estimate overhead for extra pointer more near 30% as
>near 7%.

I don't think that if you just simply remove one register from the X86
architecture you will run a 30% penalty.  The X86 uses register renaming inside
the cpu core, so that sequences of instructions that load modify and store
a specific register don't run into conflicts.  I would believe a 12% penalty
at worst, the net result of losing 12% of the programmer-visible registers.
I would really suspect it would be significantly lower than 12% due to the
register renaming that goes on inside the cpu core.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.