Author: Robert Hyatt
Date: 07:04:56 03/18/03
Go up one level in this thread
On March 18, 2003 at 07:38:39, Matt Taylor wrote: >On March 18, 2003 at 00:01:44, Robert Hyatt wrote: > >>On March 17, 2003 at 22:59:30, Aaron Gordon wrote: >> >>>On March 17, 2003 at 18:47:27, Eugene Nalimov wrote: >>> >>>>I just run the experiment. I used 2 otherwise identical 64-bit systems, one with >>>>3Mb of L3 cache, other with 1.5Mb. Machine with bigger cache run Crafty's >>>>"bench" comman 12% faster (1 CPU). >>>> >>>>That means that >>>>(1) Crafty's working set don't fit into 1.5Mb, >>>>(2) For systems with cache 1.5Mb or less (i.e. for almost all x86 systems) for >>>>Crafty memory speed matter. >>>> >>>>Thanks, >>>>Eugene >>> >>>Those types of systems aren't what people normally use. Most people here have a >>>Pentium 3, Athlon, Pentium 4, etc. Here is something I found with Crafty. >>> >>>Using the Nforce2 chipset I'm able to run the ram at speeds from 50% up to 200% >>>(100% being synchronous) of the fsb speed. I tested 200MHz FSB (400DDR) with >>>200MHz memory (400DDR) and 200fsb with 100MHz memory (200DDR). >>>The difference between ~1.6gb/s memory and ~3.2gb/s memory with craftys 'bench' >>>command was 0.14%. Yes, about one seventh of one percent. >> >>That might well suggest _another_ bottleneck in that particular machine.... > >What would that be? > >I ran a similar test on my AthlonXP 2500 w/nForce 2 chipset. Running the memory >bus at 100 MHz or 133 MHz didn't make a significant difference in nps. The >processor scored around 1.12 MN/s, and it scored some 20-30 KN/s more with a 133 >MHz memory bus. The FSB was 166 MHz in both cases. > >-Matt Were I guessing, I would guess the following: 1. no interleaving, which means that the raw memory latency is stuck at 120+ns and stays there. Faster bus means nothing without interleaving, if latency is the problem. 2. Crafty is dependent mainly on latency although it does a lot of reads as well. But if latench is the bottleneck, then a faster bus is not going to help except for whatever boost it gets from tricks used to load a cache line faster by streaming in data. When a chipset really interleaves, the first reference cycle is going to be whatever memory latency demands, but successive cycles will be faster, as 8 byte chunks come in one bus cycle later which makes every 32 bytes fetched faster with than without interleaving.
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.