Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question about data structures

Author: Robert Hyatt

Date: 10:10:55 11/30/98

Go up one level in this thread


On November 30, 1998 at 12:01:05, Tom Kerrigan wrote:

>This is all fine and good for larger arrays that you don't access a lot, but if
>you're talking about your representation of the board,
>
>int chess_board[64];
>
>will be a serious win over
>
>char chess_board[64];
>
>-Tom
>


this isn't true...  look at crafty's source.  I have char chess_board[64]
there.  I just turned it to an int and slowed down 3%.  Ditto for anything
that fits in a char...  this trick lets my entire chess board fit into two
cache lines and stay there, rather than fitting into 8 cache lines and getting
displaced a lot.

as I said, it is not the number of instructions, it is the number of memory
references.  A single cache miss on my chess board brings in 1/2 of it in one
gulp.  as opposed to only one rank.

YMMV, but those are the results I get, running on both a pentium pro 200 and
on a PII/300.

On a 2 meg xeon it might change since cache is so much bigger.  But on a 256K
or 512K cache machine, for my program, chars are better everywhere that they
are large enough...

>On November 30, 1998 at 10:23:21, Robert Hyatt wrote:
>
>>On November 30, 1998 at 03:03:18, Tom Kerrigan wrote:
>>
>>>On November 30, 1998 at 02:07:26, David Blackman wrote:
>>>
>>>>If it fits in 8 bits, use an 8 bit data type. You might want to hide
>>>>it in a typedef in case you have to change the type later on.
>>>>Reducing memory bandwidth and fitting more useful information in cache
>>>>is a big win on modern CPUs.
>>>
>>>No.
>>>
>>>It takes time to divine 8 bits from the 32 bits that most modern CPUs work with.
>>>
>>>With small arrays, using 32 bit elements will be a performance win. How much
>>>depends on how much the arrays are accessed, though...
>>>
>>>-Tom
>>
>>
>>this really depends.  I did a *lot* of benchmarking in Crafty, and found that on
>>intel platforms, where memory bandwidth is *very low*, that packing things
>>together is a win...  because you are maximizing cache hits, *not* number of
>>instructions.  I'd much rather have to do the equivalent of a shift/and to
>>extract 8 bits than to have to wait 20-40 clock cycles for a new word to trickle
>>in from main memory.
>>
>>Best performance hint for current microprocessors (of any type) is to do what
>>you can to minimize memory traffic.  If you pack 4 bytes together in a single
>>word (actually you should think 32 byte chunks here since a cache line is 32
>>bytes and it is filled at one burst) you should try to be sure that these
>>"pieces" are used relatively closely to each other (in time) so that the cache
>>line "pre-fetch" will produce data that is useful.
>>
>>On a Cray, I would use words, because I have essentially infinite memory
>>bandwidth.  Took a while to get used to the tiny soda straw between the cpu
>>and memory on the microprocessor machines after using an aquaduct on the Cray
>>for so long.  :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.