Author: James Robertson
Date: 07:39:40 10/23/98
Go up one level in this thread
On October 23, 1998 at 02:43:41, Roberto Waldteufel wrote: > >On October 22, 1998 at 16:34:11, Robert Hyatt wrote: > >>On October 22, 1998 at 14:03:01, Roberto Waldteufel wrote: >> >>> >>>I know this may seem like a dumb question, but could you explain exactly how the >>>cache works? I know the basic idea is to speed up memory acces by holding >>>frequently accessed data in a special place, the "cache", from where it may be >>>accessed more quickly, but I don't know much more than that. Is the cache part >>>of RAM, or is it held on the CPU chip, or on a separate chip? What is the >>>significance of L1 and L2 cache? I have heard that sometimes the cache is >>>accessed at different speeds depending on the machine. Is it possible to add >>>more cache, and if so would this be likely to improve performance? The most >>>important thing I wold like to understand is how I can organise my programs so >>>as to extract maximum benefit from the cache available (I use a P II 333MHz). >>> >>>In the Spanish Championships I heard that some programs (including mine), >>>especially the bigger ones, were at a disadvantage due to the absence of L2 >>>cache on the Celeron machines. I don't know how big an effect this may have had. >>>If I better understood how caching works, maybe I could improve the way I code >>>things. I'm afraid I'm rather a novice regarding hardware details like this. >>> >>>Thanks in advance, >>>Roberto >> >> >> >>The idea is simple. On the pentium/pentiumII/etc machines, cache is broken >>up into "lines" of 32 bytes. When the CPU fetches a word, instruction, >>a byte, etc, the entire block of 32 bytes (one line) is copied to the cache >>in a very fast burst. Then, from this point forward, rather than waiting >>for 20+ cpu cycles to fetch data, it is fetched from the L2 cache in 2 >>cycles on the Pentium II, or 1 cycle on the pentium pro or xeons. It can >>make the program run much faster as you might guess, in that when you >>modify memory, you really only modify the cache, and memory is updated way >>later if possible. So a loop might increment a value 100 times, but memory >>is only fetched once and written to once, which saves substantial amounts >>of time. >> >>The plain celeron 300 doesn't have any L2 cache, while the newer celeronA >>or any that are 333mhz and faster have 128KB of L2 cache that is as fast >>as the pentium pro/xeon (core cpu speed rather than core cpu speed / 2) >> >>But it is nothing more than a high-speed buffer between the CPU and >>memory, but it is obviously much smaller than memory... > >Hi Bob, > >Thanks for the explanation. Would I be right in thinking that data structures of >exactly 32 bytes lengthe would be the most efficient on PII machines, so that >the structure fits exactly into one "line" of cache? I think that the computer uses the same method it does with the registers; i.e. the computer reads and writes in 32 bit chunks, regardless of the size of the data being transferred. It will take 2 machine cycles for a 64 bit structure, but an 8 bit and a 32 bit structure will both take 1 machine cycle. I think this is how it works; if I am wrong please correct me!! James > >Best wishes, >Roberto
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.