Author: Slater Wold
Date: 14:45:25 05/30/04
Go up one level in this thread
On May 30, 2004 at 16:15:54, Robert Hyatt wrote: >On May 30, 2004 at 15:41:30, Vincent Diepeveen wrote: > >>On May 29, 2004 at 11:30:27, Robert Hyatt wrote: >> >>[snip] >>>See above. _no_ improvement. Raw latency on opteron is 1/2 the raw latency on >>>the K7 and Intel boxes. But mapping adds 2 extra memory accesses on the opteron >>>which does away with any actual advantage... >>> >>> >>> >>>> >>>>Softwarebenches like linbench and such pumping sequential a few gigabytes >>>>through the machine and then divide that by the search time. Then you have >>>>bandwidth. 1/bandwidth = latency they claim. >>> >>> >>>But that is the latency _you_ are quoting when you say opteron is 1/2 the >>>latency of the K7. In your worst-case it is _not 1/2. It is the same. >> >>Let's show you the tested facts K7 versus A64: >>Opteron single cpu 2.5 cas versus k7 cas 2.5. Note the k7 has all memory banks >>filled the opteron does *not* it just has a single dimm and is single channel >>and not even dual channel. So actually the latency is better than shown here. >>Quad opteron tested at 120 ns latency for a single cpu in fact when i tried a >>while ago. >> >>E:\dblat>dblat 300000000 >>Setting up a random access pattern, may take a while >>Finished >>Random access: 13.156 s, 131.560 ns/access >>Testing same pattern again >>Random access: 13.374 s, 133.740 ns/access >>Setting up a different random access pattern, may take a while >>Finished >>Random access: 13.343 s, 133.430 ns/access >>Testing same pattern again >>Random access: 13.265 s, 132.650 ns/access >>Sequential access offset 1: 0.250 s, 2.500 ns/access >>Sequential access offset 2: 0.484 s, 4.840 ns/access >>Sequential access offset 4: 0.875 s, 8.750 ns/access >>Sequential access offset 8: 1.781 s, 17.810 ns/access >>Sequential access offset 16: 3.375 s, 33.750 ns/access >>Sequential access offset 32: 6.265 s, 62.650 ns/access >>Sequential access offset 64: 6.516 s, 65.160 ns/access >>Sequential access offset 128: 7.000 s, 70.000 ns/access >>Sequential access offset 256: 7.938 s, 79.380 ns/access >>Sequential access offset 512: 9.188 s, 91.880 ns/access >>Sequential access offset 1024: 9.875 s, 98.750 ns/access >> >>Now the dual k7. all banks filled. a-brand memory. >>C:\tries>dblat 300000000 >>Setting up a random access pattern, may take a while >>Finished >>Random access: 36.266 s, 362.660 ns/access >>Testing same pattern again >>Random access: 36.406 s, 364.060 ns/access >>Setting up a different random access pattern, may take a while >>Finished >>Random access: 36.250 s, 362.500 ns/access >>Testing same pattern again >>Random access: 36.484 s, 364.840 ns/access >>Sequential access offset 1: 0.906 s, 9.060 ns/access >>Sequential access offset 2: 1.766 s, 17.660 ns/access >>Sequential access offset 4: 3.437 s, 34.370 ns/access >>Sequential access offset 8: 6.891 s, 68.910 ns/access >>Sequential access offset 16: 13.875 s, 138.750 ns/access >>Sequential access offset 32: 19.093 s, 190.930 ns/access >>Sequential access offset 64: 19.156 s, 191.560 ns/access >>Sequential access offset 128: 19.328 s, 193.280 ns/access >>Sequential access offset 256: 19.719 s, 197.190 ns/access >>Sequential access offset 512: 20.437 s, 204.370 ns/access >>Sequential access offset 1024: 21.860 s, 218.600 ns/access >> >>So practical difference for computerchess : >> >>363 / 132 = 2.75 times faster latency for the opteron >> >>On die memory controller isn't that stupid nah? > >Never said it was. I _did_ say that if you blow out the TLB on the K7 and on >the Opteron, the average access times are close. > >raw latency on opteron is about 70ns to do _one_ memory read. To read a random >access word, where the TLB fails, requires 5 memory reads. No way to avoid it, >and it is going to cost 350ns. _period_. On the K7, average latency is about >125ns to do _one_ memory read. To read a random access word, where the TLB >fails, requires 3 memory reads. Or about 375ns. > >Those are _real_ numbers, reported by _many_ people including AMD. > > >I have no idea what your program above does, and really don't care. But the >opteron has a much bigger TLB, if you don't blow it out by referring to at least >2048 different pages, then you are not comparing apples to apples. Opteron has >1024 TLB entries. Enough to efficiently address 4 megs of RAM (1024 * 4kb >pages). Or if your O/S is smart enough, 2 gigs of ram with 1024 entries * 2M >page size. http://chessprogramming.org/cccsearch/ccc.php?art_id=306858 It's actually a neat little utility.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.