Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question about data structures

Author: Robert Hyatt

Date: 06:43:25 12/04/98

Go up one level in this thread


On December 04, 1998 at 02:45:21, Frank Schneider wrote:

>On December 03, 1998 at 00:10:54, Robert Hyatt wrote:
>
>>On December 02, 1998 at 22:27:27, Bruce Moreland wrote:
>>
>>>
>>>On December 02, 1998 at 14:30:59, Tom Kerrigan wrote:
>>>
>>>>You're right, if other people are running faster with chars, then the statement
>>>>I made was wrong.
>>>>
>>>>I tested this with my own program, and moving from chars to ints for a few
>>>>critical arrays sped up the program by more than 5%. Maybe I'm the exception to
>>>>the rule.
>>>
>>>Could be, that's data too.  Another argument for trying it.
>>>
>>>I think it is possible that either could be better in specific situations.
>>>
>>>The chars are smaller which may have some influence on cache behavior, and the
>>>ints are bite-sized as far as the processor is concerned, so it doesn't have to
>>>mess around with clearing the top-most bits.
>>>
>>>bruce
>>
>>
>>There's two other explanations too.
>>
>>(1) changing the size of the board caused something *else* to become better
>>aligned on a cache boundary.  IE the famous case where you can delete an unused
>>variable declaration and your program slows down or speeds up.
>>
>Just one example:
>I once removed a FIXED16 from a struct (about >100byes large) and the
>program sped up by 5%.
>
>>(2) the famous direct-mapped L2 cache problem where even the same program can
>>vary in NPS significantly due to the real memory pages it occupies not mapping
>>into the L2 cache very efficiently sometimes...
>>
>>Both are a pain.
>
>I also made another observation, but I'm not sure if it is nonsense.
>The speed of the program seems to depend on the ordering of the list
>of objectfiles the linker gets. Could that be a cache-prefetch related
>effect?
>
>Frank


Indirectly.  L2 cache is direct-mapped.  To optimally fill L2, you need the
executable in consecutive pages of physical ram, or at least in consecutive
addresses as far as the low-order log2(L2_size) bits of the addressed memory
are concerned.  IE if have a new xeon with 1024K of L2 cache, the low-order 20
bits of the physical addresses are used to map 32-byte lines into the L2 cache.
For max performance, you want *the* 1024K of your program that is used most
often to be located in memory addresses that are "consecutive" in the lower
20 bits of addresses.  If you carefully order the objects (I do, check the Make
file for Crafty) you can get the "engine" to fit into this scheme, and let the
infrequently used stuff go elsewhere.  Then you can get the entire engine to fit
into L2 cache.

That said, it doesn't work quite like that, because when you build an
executable, you are building something with a virtual address space that is
optimized, but you then have to let the operating system load those pages into
physical memory, and nothing makes it load consecutive pages into consecutive
physical addresses.  So you *still* won't get optimal placement.  Some operating
systems use a "page coloring" scheme to optimize this for you, but most
(including any windows variant) don't.  Which is why you can run the same
program multiple times (after rebooting and running different combinations of
other software) and get different NPS numbers.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.