Computer Chess Club Archives


Search

Terms

Messages

Subject: Real data on cache working set

Author: Robert Hyatt

Date: 14:06:25 08/23/04

Go up one level in this thread


I downloaded the "cachegrind" program (part of vgrind package) and ran it to see
if it might shed some light on the cache footprint of Crafty.

I looked only at level 1 cache misses, and I varied the L1 cache size from 8kb
(8kb for instructions, 8kb for data) to 16M (16M bytes for instructions, 16M for
data).  I set the line size as small as possible, which was 16 bytes.  So each
cache miss grabs 16 bytes from L2/RAM.

First, the raw data:

The three columns are cache size (I and D cache same size), followed by the
instruction cache misses and data cache misses with hash=48K, hashp=16K, for a
total hash footprint of 64k bytes.

            === cache misses ===
size        inst            data
8K        139.6M           24.0M
16K        92.2M           11.6M
32K        52.7M            5.8M
64K        25.2M            4.2M
128K        110K            2.3M
256K         65K            1.0M
512K         14K            600K
1M           14K            384K
2M           14K            317K
4M           14K            315K
8M           14K            304K
16M          14K            304K

Estimated code size =  14K * 16 = 224K
Estimated data size = 300K * 16 = 4.8M

For this test, crafty could search about 5K nodes per second, and I had it set
to run the "benchmark" at 10 seconds per position, 60 seconds total, or about
300K nodes total search space.

Conclusion 1.  instruction cache misses drops rapidly until the i-cache hits
128K.  Notice that it eventually drops to 14K and sticks there.  That has to
represent the total number of 16 byte blocks of instructions needed to run the
thing.  That is 224kb.  Some of that is initialization, so the actual "kernel"
is probably in the 128KB range since 128kb marked the end of the sharp miss rate
drop.

Conclusion 2.  With 64K of total hash space, all the other data cache misses
have to be for non-hash data.  The minimum data space is 304K 16 bytes lines, of
which we know 64K represents the hash table.  I see a steady decrease in cache
misses, double the size halve the misses until the 1M mark.  This suggests that
the memory footprint is in the range of 512K to 1024K.

Conclusion 3.  This seems to match my tests on 512K 1024K and 2048K L2
processors, as the total memory footprint clearly blows out a 256K cache.  Note
that the row labeled 256K is 256K data + 256K instructions.

You can draw your own conclusions.

I hate to cloud all the disinformation here with real data, but sometimes it
does tend to shed light on a topic that gets talked about with no supporting
data of any kind.

This is the current version of crafty.  The only change was to add hash=48K and
hashp=16K to the .craftyrc file, and change the bench.c source to search for 10
seconds rather than to a fixed depth.  The normal bench.c would run for hours at
5K nodes per second under the cache emulator.

If you have any questions or comments, feel free to respond...




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.