Author: Robert Hyatt
Date: 17:52:23 10/23/98
Go up one level in this thread
On October 23, 1998 at 10:39:40, James Robertson wrote: >On October 23, 1998 at 02:43:41, Roberto Waldteufel wrote: > >> >>On October 22, 1998 at 16:34:11, Robert Hyatt wrote: >> >>>On October 22, 1998 at 14:03:01, Roberto Waldteufel wrote: >>> >>>> >>>>I know this may seem like a dumb question, but could you explain exactly how the >>>>cache works? I know the basic idea is to speed up memory acces by holding >>>>frequently accessed data in a special place, the "cache", from where it may be >>>>accessed more quickly, but I don't know much more than that. Is the cache part >>>>of RAM, or is it held on the CPU chip, or on a separate chip? What is the >>>>significance of L1 and L2 cache? I have heard that sometimes the cache is >>>>accessed at different speeds depending on the machine. Is it possible to add >>>>more cache, and if so would this be likely to improve performance? The most >>>>important thing I wold like to understand is how I can organise my programs so >>>>as to extract maximum benefit from the cache available (I use a P II 333MHz). >>>> >>>>In the Spanish Championships I heard that some programs (including mine), >>>>especially the bigger ones, were at a disadvantage due to the absence of L2 >>>>cache on the Celeron machines. I don't know how big an effect this may have had. >>>>If I better understood how caching works, maybe I could improve the way I code >>>>things. I'm afraid I'm rather a novice regarding hardware details like this. >>>> >>>>Thanks in advance, >>>>Roberto >>> >>> >>> >>>The idea is simple. On the pentium/pentiumII/etc machines, cache is broken >>>up into "lines" of 32 bytes. When the CPU fetches a word, instruction, >>>a byte, etc, the entire block of 32 bytes (one line) is copied to the cache >>>in a very fast burst. Then, from this point forward, rather than waiting >>>for 20+ cpu cycles to fetch data, it is fetched from the L2 cache in 2 >>>cycles on the Pentium II, or 1 cycle on the pentium pro or xeons. It can >>>make the program run much faster as you might guess, in that when you >>>modify memory, you really only modify the cache, and memory is updated way >>>later if possible. So a loop might increment a value 100 times, but memory >>>is only fetched once and written to once, which saves substantial amounts >>>of time. >>> >>>The plain celeron 300 doesn't have any L2 cache, while the newer celeronA >>>or any that are 333mhz and faster have 128KB of L2 cache that is as fast >>>as the pentium pro/xeon (core cpu speed rather than core cpu speed / 2) >>> >>>But it is nothing more than a high-speed buffer between the CPU and >>>memory, but it is obviously much smaller than memory... >> >>Hi Bob, >> >>Thanks for the explanation. Would I be right in thinking that data structures of >>exactly 32 bytes lengthe would be the most efficient on PII machines, so that >>the structure fits exactly into one "line" of cache? > >I think that the computer uses the same method it does with the registers; i.e. >the computer reads and writes in 32 bit chunks, regardless of the size of the >data being transferred. It will take 2 machine cycles for a 64 bit structure, >but an 8 bit and a 32 bit structure will both take 1 machine cycle. I think this >is how it works; if I am wrong please correct me!! > >James It's a complicated issue, but basically the pentium *always* reads 32 bytes at a time using 4 consecutive cycles, so it can fill a complete cache line. The main thing is to put data close together that will be used at about the same time to take advantage of this "pre-fetch"... > >> >>Best wishes, >>Roberto
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.