Author: Roberto Waldteufel
Date: 22:14:12 10/23/98
Go up one level in this thread
On October 23, 1998 at 20:52:23, Robert Hyatt wrote: >On October 23, 1998 at 10:39:40, James Robertson wrote: > >>On October 23, 1998 at 02:43:41, Roberto Waldteufel wrote: >> >>> >>>On October 22, 1998 at 16:34:11, Robert Hyatt wrote: >>> >>>>On October 22, 1998 at 14:03:01, Roberto Waldteufel wrote: >>>> >>>>> >>>>>I know this may seem like a dumb question, but could you explain exactly how the >>>>>cache works? I know the basic idea is to speed up memory acces by holding >>>>>frequently accessed data in a special place, the "cache", from where it may be >>>>>accessed more quickly, but I don't know much more than that. Is the cache part >>>>>of RAM, or is it held on the CPU chip, or on a separate chip? What is the >>>>>significance of L1 and L2 cache? I have heard that sometimes the cache is >>>>>accessed at different speeds depending on the machine. Is it possible to add >>>>>more cache, and if so would this be likely to improve performance? The most >>>>>important thing I wold like to understand is how I can organise my programs so >>>>>as to extract maximum benefit from the cache available (I use a P II 333MHz). >>>>> >>>>>In the Spanish Championships I heard that some programs (including mine), >>>>>especially the bigger ones, were at a disadvantage due to the absence of L2 >>>>>cache on the Celeron machines. I don't know how big an effect this may have had. >>>>>If I better understood how caching works, maybe I could improve the way I code >>>>>things. I'm afraid I'm rather a novice regarding hardware details like this. >>>>> >>>>>Thanks in advance, >>>>>Roberto >>>> >>>> >>>> >>>>The idea is simple. On the pentium/pentiumII/etc machines, cache is broken >>>>up into "lines" of 32 bytes. When the CPU fetches a word, instruction, >>>>a byte, etc, the entire block of 32 bytes (one line) is copied to the cache >>>>in a very fast burst. Then, from this point forward, rather than waiting >>>>for 20+ cpu cycles to fetch data, it is fetched from the L2 cache in 2 >>>>cycles on the Pentium II, or 1 cycle on the pentium pro or xeons. It can >>>>make the program run much faster as you might guess, in that when you >>>>modify memory, you really only modify the cache, and memory is updated way >>>>later if possible. So a loop might increment a value 100 times, but memory >>>>is only fetched once and written to once, which saves substantial amounts >>>>of time. >>>> >>>>The plain celeron 300 doesn't have any L2 cache, while the newer celeronA >>>>or any that are 333mhz and faster have 128KB of L2 cache that is as fast >>>>as the pentium pro/xeon (core cpu speed rather than core cpu speed / 2) >>>> >>>>But it is nothing more than a high-speed buffer between the CPU and >>>>memory, but it is obviously much smaller than memory... >>> >>>Hi Bob, >>> >>>Thanks for the explanation. Would I be right in thinking that data structures of >>>exactly 32 bytes lengthe would be the most efficient on PII machines, so that >>>the structure fits exactly into one "line" of cache? >> >>I think that the computer uses the same method it does with the registers; i.e. >>the computer reads and writes in 32 bit chunks, regardless of the size of the >>data being transferred. It will take 2 machine cycles for a 64 bit structure, >>but an 8 bit and a 32 bit structure will both take 1 machine cycle. I think this >>is how it works; if I am wrong please correct me!! >> >>James > > >It's a complicated issue, but basically the pentium *always* reads 32 bytes at >a time using 4 consecutive cycles, so it can fill a complete cache line. The >main thing is to put data close together that will be used at about the same >time to take advantage of this "pre-fetch"... > > I think I start to get the idea now. Say I am using a hash table of a million entries, and I have, say, 4 pieces of information to store in each entry I use. Then I do better to have one array of a million single, multi-purpose records than to have 4 separate arrays of a million entries each, one for each type of information. That way, when I access the first piece of information, I "pre-fetch" the others at the same time, whereas with four separate arrays, the information resides in 4 distant locations, right? I never really considered this sort of thing before, but maybe I could redesign a lot of my arrays in the light of the cache usage. Best wishes, Roberto > > >> >>> >>>Best wishes, >>>Roberto
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.