Author: Robert Hyatt
Date: 13:09:10 03/27/01
Go up one level in this thread
On March 27, 2001 at 15:59:22, Vincent Diepeveen wrote: >On March 27, 2001 at 13:53:13, Robert Hyatt wrote: > >>On March 27, 2001 at 13:34:12, Vincent Diepeveen wrote: >> >>>On March 27, 2001 at 09:36:54, Robert Hyatt wrote: >>> >>>>On March 27, 2001 at 08:52:15, Andrew Dados wrote: >>>> >>>>>On March 27, 2001 at 08:17:43, Vincent Diepeveen wrote: >>>>> >>>>>>On March 26, 2001 at 22:44:53, Robert Hyatt wrote: >>>>>> >>>>>>>On March 26, 2001 at 20:54:48, Dan Andersson wrote: >>>>>>> >>>>>>>>I have to agree that multibanked memory and lage cashe size are very beneficial. >>>>>>>>Those factors could very well explain the superlinearity. >>>>>>>> >>>>>>>>Regards Dan Andersson >>>>>>> >>>>>>> >>>>>>>I don't think the 4-way interleaving helps. That _barely_ lets the machine >>>>>>>hold its own, memory-wise, because there are 4x as many cpus fighting over >>>>>>>access to memory... making it nearly 4x faster just barely breaks even. The >>>>>>>larger L2 cache may well make a difference, of course... >>>>>> >>>>>>You speak for Crafty. I speak for DIEP. >>>>>>I'm doing 8 probes of at least 16 bytes an entry. >>>>>>that's 128 bytes. >>>>> >>>>>My guess is big difference for SMP DIEP is running separate processes instead of >>>>>threads. 8 probes is just icing on that cake. >>>>> >>>>>For what I know windows (unix too) will load each process to its own address >>>>>space, so they will fight for L3 cashe. >>>>>Or am I totally wrong here? >>>>> >>>>>-Andrew- >>>>> >>>> >>>>This is correct. Threads share one address space so this isn't such a huge >>>>problem... >>> >>>but your program with threads is slower as you need extra pointers >>>everywhere except if you start using non-ansi C standards. >>> >>>How do i evaluate a board position in ansi-C using multithreading >>>without needing to load an extra pointer? >>> >>> >>> >> >> >>This is a moot issue. I mentioned before that when I first converted to the >>pointer approach, I was expecting a huge performance hit. In reality it was >>less than 7% and over time that dropped to under 5%. I would bet that that >>5% is swamped by the advantage of having one large common virtual address >>space which prevents continual cache flushes... > >But that 7% would get quite some more if you would have more patterns >that require the board. It's for sure even more if you would be non-bitboard, >like i am, as then it needs to reload and reload that pointer everywhere >and in complex expressions the optimization of it will suffer too. I have _one_ pointer to the per-process context. That takes up one register. Since that pointer is used so frequently everywhere, it tends to stick in a register, which is ok... As far as whether the pointer is bad or not, my evaluation accesses the chess board representations thousands of times for a given node. Just the pawn scoring evaluation alone looks at the board so many times it is ridiculous... > >For diep i would estimate overhead for extra pointer more near 30% as >near 7%. I don't think that if you just simply remove one register from the X86 architecture you will run a 30% penalty. The X86 uses register renaming inside the cpu core, so that sequences of instructions that load modify and store a specific register don't run into conflicts. I would believe a 12% penalty at worst, the net result of losing 12% of the programmer-visible registers. I would really suspect it would be significantly lower than 12% due to the register renaming that goes on inside the cpu core.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.