Author: Robert Hyatt
Date: 15:06:29 10/25/01
Go up one level in this thread
On October 25, 2001 at 13:26:26, Tom Kerrigan wrote: >On October 24, 2001 at 23:13:13, Robert Hyatt wrote: > >>On October 24, 2001 at 17:16:06, Tom Kerrigan wrote: >> >>>On October 24, 2001 at 15:23:35, Robert Hyatt wrote: >>> >>>>"numbers" simply don't require such large representations, which wastes a lot >>>>of bus bandwidth transferring 128 bit values when the majority are 16 bits or >>>>less... >>> >>>The only bus bandwidth that's really wasted is in the datapath, which doesn't >>>really matter. Just because the datapath is 128 bits doesn't mean all memory >>>transactions have to be. >>> >>>-Tom >> >> >>Sure it does. In fact, the memory datapath is _always_ a multiple of the >>wordsize, otherwise super-scalar won't work at all. >> >>IE Intel uses 64 bit data paths. Alphas use 256. Cray does it totally >>different but they gate pairs of words (128 bits) to/from memory... > >Depends on what you mean by datapath. I'm using the comp org term, i.e., the >register file, ALU, and busses in between. I was using the term to describe how many bits are transferred between the CPU and memory. Internal to the CPU is a more common term, I agree. In the case of a 64 bit cpu, the internal CPU datapaths are certainly 64 bits. And if you are using 32 bit values, it is wasting time depending on how you look at it. Because you _could_ be transferring 64 bit values if you had them, but in reality you are transferring 32 bit values, and wasting 1/2 of the available bandwidth.. I think that is where bitmap programs are best suited, because they use the entire 64 bits most of the time... Using this term, Intel is 32-bit and >Alpha is 64-bit. I don't understand your superscalar comment--not every >instruction is a load/store. Why wouldn't you be able to issue two 32-bit ADD >instructions on a 8192-bit CPU just as easily as on a 32-bit CPU? And if your >load/store instructions only loaded and stored a fraction of a register at a >time, then the width of your memory interface doesn't matter that much, either. >Of course, a lot of ALU bits would be wasted, but oh well. As you mentioned, _most_ benchmarks are not testing the cpu-cache or cpu-register performance. They are testing the cpu-memory performance. And super-scalar machines are dogs if they can't pony up enough data to keep the pipes flowing. The Cray excels here, for example. It fetches 4 words per cycle (64 bit words) and stores 2 words per cycle, per processor, to keep the 2-way superscalar stuff flowing smoothly. > >-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.