Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Here is your new data...

Author: Robert Hyatt

Date: 20:15:10 08/24/04

Go up one level in this thread


L1/D1              60sec             120sec
size            inst    data      inst    data
 32K           45.6M    9.6M     67.3M   12.3M
 64K           21.8M    3.0M     43.5M    5.4M
 128K           303K    1.9M      517K    3.2M
 256K           252K    922K      456K    1.5M
 512K            14K    510K       14K    649K
1024K            14K    367K       14K    413K
2048K            14K    307K       14K    309K

I ran the same tests, from 32K X 2 to 2048K X 2, 16 byte line size, 16-way set
associativity, first for 60 seconds then for 120 seconds.  Above is the L1I and
L1D cache misses for each test.

I conclude that 128K is not big enough for the I cache, but it is far closer to
"right" than 64K.  Best I-cache size is 512 as there cache misses drop to 14K
and stick.  After 60 seconds there  are no additional cache misses at 512K for
instructions.  At 256K there are still almost 2x misses after 120 secs as after
60 secs so the entire working set is not fitting in 256K.  This is obviously
only for instructions.

For data, cache misses seem to stop at between 512K and 1024K.  Again suggesting
that the _total_ cache footprint is beyond 256K.  Beyond 512K.  Beyond 1024K.
2048K seems to be the size to really stop cache misses in their tracks.

_AGAIN_ this matches my previous 1M is better than 512K, 2M is better than 1M.
And Eugene's 3.0M is better than 1.5M.

I don't see anything further to do.  It seems pretty clear to me...

Feel free to dream up _another_ scenario since this one doesn't seem to fit your
hypothesis either.

Crafty was run with No EGTB support compiled in whatsoever, and was run with
hash=48K and hashp=12K again for a total hash memory requirement of only 64K
bytes.  First run was the 6 bench positions at 10 secs each, then run again for
20 secs each.  All run on my PIV 2.8 box.  No threads compiled in.  No threads
(SMP) stuff used.  _real_ data would be somewhat worse since this only uses one
thread block, where a real run would use somewhere around 24 max for two
processors.  Or maybe 60 for four on the quad opteron I use.  But for now, with
a stripped-down no-egtb no-thread version the above numbers were produced.

What next???




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.