Author: Robert Hyatt
Date: 14:06:25 08/23/04
Go up one level in this thread
I downloaded the "cachegrind" program (part of vgrind package) and ran it to see if it might shed some light on the cache footprint of Crafty. I looked only at level 1 cache misses, and I varied the L1 cache size from 8kb (8kb for instructions, 8kb for data) to 16M (16M bytes for instructions, 16M for data). I set the line size as small as possible, which was 16 bytes. So each cache miss grabs 16 bytes from L2/RAM. First, the raw data: The three columns are cache size (I and D cache same size), followed by the instruction cache misses and data cache misses with hash=48K, hashp=16K, for a total hash footprint of 64k bytes. === cache misses === size inst data 8K 139.6M 24.0M 16K 92.2M 11.6M 32K 52.7M 5.8M 64K 25.2M 4.2M 128K 110K 2.3M 256K 65K 1.0M 512K 14K 600K 1M 14K 384K 2M 14K 317K 4M 14K 315K 8M 14K 304K 16M 14K 304K Estimated code size = 14K * 16 = 224K Estimated data size = 300K * 16 = 4.8M For this test, crafty could search about 5K nodes per second, and I had it set to run the "benchmark" at 10 seconds per position, 60 seconds total, or about 300K nodes total search space. Conclusion 1. instruction cache misses drops rapidly until the i-cache hits 128K. Notice that it eventually drops to 14K and sticks there. That has to represent the total number of 16 byte blocks of instructions needed to run the thing. That is 224kb. Some of that is initialization, so the actual "kernel" is probably in the 128KB range since 128kb marked the end of the sharp miss rate drop. Conclusion 2. With 64K of total hash space, all the other data cache misses have to be for non-hash data. The minimum data space is 304K 16 bytes lines, of which we know 64K represents the hash table. I see a steady decrease in cache misses, double the size halve the misses until the 1M mark. This suggests that the memory footprint is in the range of 512K to 1024K. Conclusion 3. This seems to match my tests on 512K 1024K and 2048K L2 processors, as the total memory footprint clearly blows out a 256K cache. Note that the row labeled 256K is 256K data + 256K instructions. You can draw your own conclusions. I hate to cloud all the disinformation here with real data, but sometimes it does tend to shed light on a topic that gets talked about with no supporting data of any kind. This is the current version of crafty. The only change was to add hash=48K and hashp=16K to the .craftyrc file, and change the bench.c source to search for 10 seconds rather than to a fixed depth. The normal bench.c would run for hours at 5K nodes per second under the cache emulator. If you have any questions or comments, feel free to respond...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.