Author: Robert Hyatt
Date: 10:10:55 11/30/98
Go up one level in this thread
On November 30, 1998 at 12:01:05, Tom Kerrigan wrote: >This is all fine and good for larger arrays that you don't access a lot, but if >you're talking about your representation of the board, > >int chess_board[64]; > >will be a serious win over > >char chess_board[64]; > >-Tom > this isn't true... look at crafty's source. I have char chess_board[64] there. I just turned it to an int and slowed down 3%. Ditto for anything that fits in a char... this trick lets my entire chess board fit into two cache lines and stay there, rather than fitting into 8 cache lines and getting displaced a lot. as I said, it is not the number of instructions, it is the number of memory references. A single cache miss on my chess board brings in 1/2 of it in one gulp. as opposed to only one rank. YMMV, but those are the results I get, running on both a pentium pro 200 and on a PII/300. On a 2 meg xeon it might change since cache is so much bigger. But on a 256K or 512K cache machine, for my program, chars are better everywhere that they are large enough... >On November 30, 1998 at 10:23:21, Robert Hyatt wrote: > >>On November 30, 1998 at 03:03:18, Tom Kerrigan wrote: >> >>>On November 30, 1998 at 02:07:26, David Blackman wrote: >>> >>>>If it fits in 8 bits, use an 8 bit data type. You might want to hide >>>>it in a typedef in case you have to change the type later on. >>>>Reducing memory bandwidth and fitting more useful information in cache >>>>is a big win on modern CPUs. >>> >>>No. >>> >>>It takes time to divine 8 bits from the 32 bits that most modern CPUs work with. >>> >>>With small arrays, using 32 bit elements will be a performance win. How much >>>depends on how much the arrays are accessed, though... >>> >>>-Tom >> >> >>this really depends. I did a *lot* of benchmarking in Crafty, and found that on >>intel platforms, where memory bandwidth is *very low*, that packing things >>together is a win... because you are maximizing cache hits, *not* number of >>instructions. I'd much rather have to do the equivalent of a shift/and to >>extract 8 bits than to have to wait 20-40 clock cycles for a new word to trickle >>in from main memory. >> >>Best performance hint for current microprocessors (of any type) is to do what >>you can to minimize memory traffic. If you pack 4 bytes together in a single >>word (actually you should think 32 byte chunks here since a cache line is 32 >>bytes and it is filled at one burst) you should try to be sure that these >>"pieces" are used relatively closely to each other (in time) so that the cache >>line "pre-fetch" will produce data that is useful. >> >>On a Cray, I would use words, because I have essentially infinite memory >>bandwidth. Took a while to get used to the tiny soda straw between the cpu >>and memory on the microprocessor machines after using an aquaduct on the Cray >>for so long. :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.