Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How's 66.2ns? ;)

Author: Robert Hyatt

Date: 12:39:39 03/19/03

Go up one level in this thread


On March 19, 2003 at 15:02:54, Aaron Gordon wrote:

>On March 19, 2003 at 14:24:46, Robert Hyatt wrote:
>
>>On March 19, 2003 at 14:17:52, Aaron Gordon wrote:
>>
>>>On March 19, 2003 at 13:55:12, Robert Hyatt wrote:
>>>
>>>>On March 19, 2003 at 11:41:01, Aaron Gordon wrote:
>>>>
>>>>>On March 19, 2003 at 11:32:05, Robert Hyatt wrote:
>>>>>
>>>>>>
>>>>>>For those interested, the lmbench is pretty easy to run.  I generally install
>>>>>>it, type
>>>>>>"make" to compile everything, then type "make results".  This will ask a few
>>>>>>questions and for the specific benchmark, I usually do "HARDWARE" only as
>>>>>>opposed to all the benchmarks which measure filesystem speed, a lot of O/S stuff
>>>>>>like context switching time, network latency, etc.
>>>>>>
>>>>>>Once that finishes the first time, you can run it multiple times with the "make
>>>>>>rerun"
>>>>>>which is always advisable to see if the numbers change very slightly the second
>>>>>>run, due
>>>>>>to the program already being loaded into memory.
>>>>>>
>>>>>>Then "make see".  For latency, look near the bottom.  Here are the specifics for
>>>>>>my two
>>>>>>personal machines.
>>>>>>
>>>>>>1.  Sony VAIO super-slim with a PIII/750mhz, and 256mb of SDRAM:
>>>>>>
>>>>>>
>>>>>>Memory latencies in nanoseconds - smaller is better
>>>>>>    (WARNING - may not be correct, check graphs)
>>>>>>------------------------------------------------------------------
>>>>>>Host                 OS   Mhz   L1 $   L2 $    Main mem    Guesses
>>>>>>--------- -------------   ---   ----   ----    --------    -------
>>>>>>scrappy    Linux 2.4.20   744 4.0370 9.4300       130.2
>>>>>>
>>>>>>
>>>>>>2.  Dual PIV xeon 2.8ghz, 1.0gb DDRAM, 400mhz FSB
>>>>>>
>>>>>>Memory latencies in nanoseconds - smaller is better
>>>>>>    (WARNING - may not be correct, check graphs)
>>>>>>------------------------------------------------------------------
>>>>>>Host                 OS   Mhz   L1 $   L2 $    Main mem    Guesses
>>>>>>--------- -------------   ---   ----   ----    --------    -------
>>>>>>crafty     Linux 2.4.20  2788 0.7180 6.5900       151.4
>>>>>>
>>>>>>
>>>>>>Final results, my Sony with SDRAM (known for better latency) reports 130ns,
>>>>>>while my xeon with DDRAM (known for worse latency but not nearly as bad
>>>>>>as RDRAM) reports 151ns.  So it seems that my 120ns number is really wrong.
>>>>>>But not in the direction everyone was claiming.  :)
>>>>>>
>>>>>>If you want to download the benchmark, a search for "lmbench" should get you to
>>>>>>the right place.  I'm running version 3.0.  I don't know if there is a newer
>>>>>>version out.
>>>>>>
>>>>>>It is very interesting to watch it "dig" out your cache line size, TLB size,
>>>>>>etc.  And it
>>>>>>also reports on cpu latency for specific instructions.  IE integer bit
>>>>>>instructions take .2ns
>>>>>>on my 2.8ghz processor.  That is as expected as each int op should buzz thru in
>>>>>>1/2 a clock
>>>>>>cycle, which is 1/2.8 ns per clock.
>>>>>>
>>>>>>Have fun, for those that are interested and those that "doubt".
>>>>>>
>>>>>>:)
>>>>>
>>>>>I ran the tests Hyatt. Lmbench appears to be wrong. Here is what I have so far.
>>>>>I will post more as I do the tests... These are from LMBench
>>>>>
>>>>>2.4GHz | 220fsb single-channel | CL2.5 |  66.2ns
>>>>>2.4GHz | 200fsb single-channel | CL2.5 |  73.0ns
>>>>>2.4GHz | 150fsb single-channel | CL2.5 | 102.3ns
>>>>>2.4GHz | 133fsb single-channel | CL2.5 | 114.8ns
>>>>>2.4GHz | 100fsb single-channel | CL2.0 | 123.7ns
>>>>>
>>>>>I will be testing dual-channels here in a moment. Also, I do not believe that
>>>>>LMBench is accurate. I'll do more testing with Sciencemark 2.0 for Windows. From
>>>>>the memory latency tests Matt Taylor and I agree it appears (so far) to be
>>>>>accurate. As for the LMBench test results, I'll tar.gz the results directory and
>>>>>send them to anyone that wishes to have them...
>>>>
>>>>
>>>>lmbench has _never_ been wrong in the past.  It is a very well-known benchmark
>>>>with
>>>>a couple of journal papers behind it giving the details.  66.2 seems wrong, as I
>>>>mentioned,
>>>>because the Cray T90 can't break 100ns.  And they are _known_ for memory speed
>>>>bandwidth and no cache.
>>>>
>>>>I'm not sure if you are greatly overclocking things or not.  If so, maybe that
>>>>is what is
>>>>making the 66ns time show up.  I gave the results for my dual xeon with DDR ram,
>>>>and
>>>>it certainly is nowhere near that.
>>>
>>>I'm overclocking, no doubt. At 220MHz fsb (440DDR) it was showing 66.2ns as I
>>>mentioned before. Just because I'm overclocking doesn't mean the score isn't
>>>right, it just means here in a few years when you get a system with a bus as
>>>fast as this one you'll see similar numbers.
>>
>>What it might mean is that your numbers are not reproducible by others, however.
>> We
>>have a memory tester here, and the results are really quite remarkable.  You can
>>plug in
>>a SIMM/DIMM and test it and it will show you the fastest clock speed it can run
>>at.  You
>>can plug in another from the _same_ shipment and get a 10-15-20ns difference in
>>speed.
>>
>>No doubt if you worked at a place producing DIMMs, you could screen a set that
>>would
>>run at truly amazing speeds.
>>
>>?> Right now you're only sticking to
>>>100 & 133fsb (This is what P4's run at), which is very, very slow. Also the
>>>nForce2 is clearly supperior to the VIA chipsets. My KT333 at 200fsb is pulling
>>>around 94-95ns latency, at 200fsb the nForce2 is 73-74ns.
>>
>>Note that 100mhz is not slow if you pump four transactions per bus cycle.
>>That's not
>>half bad and is just as fast as if it were clocked at 400mhz.  Ditto for
>>133/533.
>
>Well, with Pentium 4's you won't even see close to your true max theoretical
>bandwidth. Intel can say, "400MHz fsb" this and "533MHz fsb" that. You will
>NEVER see it. With the Nforce2 you will see within 2 percent of your absolute
>max. Lets say you run 200fsb, max is 3.2gb/s, with help from the dual-channel
>DDR you'll see about 3.15gb/s. At 220fsb I see 3.5gb/s or so. No P4 can do this.
>Especially at such low latencies.
>
>Here are some results from a friend of mines PC, he's using an Asus A7N8X Deluxe
>(nforce2) and an AthlonXP 2800+ (2.25GHz) at 166fsb(333DDR) with nothing
>overclocked. He said he was running low to moderate memory timings. I'm sure he
>could get below 100ns by switching to CL2 (& fast timings). Also, if he pushed
>up to 200-220fsb like I did he'd start seeing 60-80ns times as well. Here you
>go:
>
>"stride=1024

this looks like L1:

>0.00049 1.333
>0.00098 1.333
>0.00195 1.333
>0.00293 1.333
>0.00391 1.333
>0.00586 1.333
>0.00781 1.333
>0.00977 1.333
>0.01172 1.333
>0.01367 1.333
>0.01562 1.333
>0.01758 1.333
>0.01953 1.333
>0.02148 1.333
>0.02344 1.333
>0.02539 1.333
>0.02734 1.333
>0.02930 1.333
>0.03125 1.333
>0.03516 1.333
>0.03906 1.333
>0.04297 1.333
>0.04688 1.333
>0.05078 1.333
>0.05469 1.333
>0.05859 1.333
>0.06250 1.333
>0.07031 3.716


This starts to miss L1 and hit L2:



>0.07812 5.683
>0.08594 7.458
>0.09375 8.888
>0.10156 8.888
>0.10938 8.888
>0.11719 8.888
>0.12500 8.888
>0.14062 8.888
>0.15625 8.888
>0.17188 8.888
>0.18750 8.888
>0.20312 8.888
>0.21875 8.888
>0.23438 8.888
>0.25000 8.888
>0.28125 9.629
>0.31250 38.464
>0.34375 79.928


and this starts to miss L2 and hit on the raw memory latency:



>0.37500 92.567
>0.40625 98.930
>0.43750 101.473
>0.46875 101.193
>0.50000 101.631
>1.00000 101.582
>1.50000 102.518
>2.00000 102.981
>2.50000 103.132
>3.00000 103.280
>3.50000 102.981
>4.00000 103.471
>5.00000 103.584
>6.00000 103.817
>7.00000 103.420
>8.00000 103.815
>10.00000 103.997
>12.00000 103.967
>14.00000 103.715
>16.00000 105.000
>18.00000 104.937
>20.00000 105.740
>30.00000 106.981
>
>This is from the LMBench result file.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.