Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: random access latency opteron versus k7

Author: Slater Wold
Date: 14:45:25 05/30/04
On May 30, 2004 at 16:15:54, Robert Hyatt wrote:

>On May 30, 2004 at 15:41:30, Vincent Diepeveen wrote:
>
>>On May 29, 2004 at 11:30:27, Robert Hyatt wrote:
>>
>>[snip]
>>>See above.  _no_ improvement.  Raw latency on opteron is 1/2 the raw latency on
>>>the K7 and Intel boxes.  But mapping adds 2 extra memory accesses on the opteron
>>>which does away with any actual advantage...
>>>
>>>
>>>
>>>>
>>>>Softwarebenches like linbench and such pumping sequential a few gigabytes
>>>>through the machine and then divide that by the search time. Then you have
>>>>bandwidth. 1/bandwidth = latency they claim.
>>>
>>>
>>>But that is the latency _you_ are quoting when you say opteron is 1/2 the
>>>latency of the K7.  In your worst-case it is _not 1/2.  It is the same.
>>
>>Let's show you the tested facts K7 versus A64:
>>Opteron single cpu 2.5 cas versus k7 cas 2.5. Note the k7 has all memory banks
>>filled the opteron does *not* it just has a single dimm and is single channel
>>and not even dual channel. So actually the latency is better than shown here.
>>Quad opteron tested at 120 ns latency for a single cpu in fact when i tried a
>>while ago.
>>
>>E:\dblat>dblat 300000000
>>Setting up a random access pattern, may take a while
>>Finished
>>Random access:  13.156 s, 131.560 ns/access
>>Testing same pattern again
>>Random access:  13.374 s, 133.740 ns/access
>>Setting up a different random access pattern, may take a while
>>Finished
>>Random access:  13.343 s, 133.430 ns/access
>>Testing same pattern again
>>Random access:  13.265 s, 132.650 ns/access
>>Sequential access offset     1:   0.250 s,   2.500 ns/access
>>Sequential access offset     2:   0.484 s,   4.840 ns/access
>>Sequential access offset     4:   0.875 s,   8.750 ns/access
>>Sequential access offset     8:   1.781 s,  17.810 ns/access
>>Sequential access offset    16:   3.375 s,  33.750 ns/access
>>Sequential access offset    32:   6.265 s,  62.650 ns/access
>>Sequential access offset    64:   6.516 s,  65.160 ns/access
>>Sequential access offset   128:   7.000 s,  70.000 ns/access
>>Sequential access offset   256:   7.938 s,  79.380 ns/access
>>Sequential access offset   512:   9.188 s,  91.880 ns/access
>>Sequential access offset  1024:   9.875 s,  98.750 ns/access
>>
>>Now the dual k7. all banks filled. a-brand memory.
>>C:\tries>dblat 300000000
>>Setting up a random access pattern, may take a while
>>Finished
>>Random access:  36.266 s, 362.660 ns/access
>>Testing same pattern again
>>Random access:  36.406 s, 364.060 ns/access
>>Setting up a different random access pattern, may take a while
>>Finished
>>Random access:  36.250 s, 362.500 ns/access
>>Testing same pattern again
>>Random access:  36.484 s, 364.840 ns/access
>>Sequential access offset     1:   0.906 s,   9.060 ns/access
>>Sequential access offset     2:   1.766 s,  17.660 ns/access
>>Sequential access offset     4:   3.437 s,  34.370 ns/access
>>Sequential access offset     8:   6.891 s,  68.910 ns/access
>>Sequential access offset    16:  13.875 s, 138.750 ns/access
>>Sequential access offset    32:  19.093 s, 190.930 ns/access
>>Sequential access offset    64:  19.156 s, 191.560 ns/access
>>Sequential access offset   128:  19.328 s, 193.280 ns/access
>>Sequential access offset   256:  19.719 s, 197.190 ns/access
>>Sequential access offset   512:  20.437 s, 204.370 ns/access
>>Sequential access offset  1024:  21.860 s, 218.600 ns/access
>>
>>So practical difference for computerchess :
>>
>>363 / 132 = 2.75 times faster latency for the opteron
>>
>>On die memory controller isn't that stupid nah?
>
>Never said it was.  I _did_ say that if you blow out the TLB on the K7 and on
>the Opteron, the average access times are close.
>
>raw latency on opteron is about 70ns to do _one_ memory read.  To read a random
>access word, where the TLB fails, requires 5 memory reads.  No way to avoid it,
>and it is going to cost 350ns.  _period_.  On the K7, average latency is about
>125ns to do _one_ memory read.  To read a random access word, where the TLB
>fails, requires 3 memory reads.  Or about 375ns.
>
>Those are _real_ numbers, reported by _many_ people including AMD.
>
>
>I have no idea what your program above does, and really don't care.  But the
>opteron has a much bigger TLB, if you don't blow it out by referring to at least
>2048 different pages, then you are not comparing apples to apples.  Opteron has
>1024 TLB entries.  Enough to efficiently address 4 megs of RAM (1024 * 4kb
>pages).  Or if your O/S is smart enough, 2 gigs of ram with 1024 entries * 2M
>page size.

http://chessprogramming.org/cccsearch/ccc.php?art_id=306858

It's actually a neat little utility.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.