Author: Robert Hyatt
Date: 12:39:39 03/19/03
Go up one level in this thread
On March 19, 2003 at 15:02:54, Aaron Gordon wrote: >On March 19, 2003 at 14:24:46, Robert Hyatt wrote: > >>On March 19, 2003 at 14:17:52, Aaron Gordon wrote: >> >>>On March 19, 2003 at 13:55:12, Robert Hyatt wrote: >>> >>>>On March 19, 2003 at 11:41:01, Aaron Gordon wrote: >>>> >>>>>On March 19, 2003 at 11:32:05, Robert Hyatt wrote: >>>>> >>>>>> >>>>>>For those interested, the lmbench is pretty easy to run. I generally install >>>>>>it, type >>>>>>"make" to compile everything, then type "make results". This will ask a few >>>>>>questions and for the specific benchmark, I usually do "HARDWARE" only as >>>>>>opposed to all the benchmarks which measure filesystem speed, a lot of O/S stuff >>>>>>like context switching time, network latency, etc. >>>>>> >>>>>>Once that finishes the first time, you can run it multiple times with the "make >>>>>>rerun" >>>>>>which is always advisable to see if the numbers change very slightly the second >>>>>>run, due >>>>>>to the program already being loaded into memory. >>>>>> >>>>>>Then "make see". For latency, look near the bottom. Here are the specifics for >>>>>>my two >>>>>>personal machines. >>>>>> >>>>>>1. Sony VAIO super-slim with a PIII/750mhz, and 256mb of SDRAM: >>>>>> >>>>>> >>>>>>Memory latencies in nanoseconds - smaller is better >>>>>> (WARNING - may not be correct, check graphs) >>>>>>------------------------------------------------------------------ >>>>>>Host OS Mhz L1 $ L2 $ Main mem Guesses >>>>>>--------- ------------- --- ---- ---- -------- ------- >>>>>>scrappy Linux 2.4.20 744 4.0370 9.4300 130.2 >>>>>> >>>>>> >>>>>>2. Dual PIV xeon 2.8ghz, 1.0gb DDRAM, 400mhz FSB >>>>>> >>>>>>Memory latencies in nanoseconds - smaller is better >>>>>> (WARNING - may not be correct, check graphs) >>>>>>------------------------------------------------------------------ >>>>>>Host OS Mhz L1 $ L2 $ Main mem Guesses >>>>>>--------- ------------- --- ---- ---- -------- ------- >>>>>>crafty Linux 2.4.20 2788 0.7180 6.5900 151.4 >>>>>> >>>>>> >>>>>>Final results, my Sony with SDRAM (known for better latency) reports 130ns, >>>>>>while my xeon with DDRAM (known for worse latency but not nearly as bad >>>>>>as RDRAM) reports 151ns. So it seems that my 120ns number is really wrong. >>>>>>But not in the direction everyone was claiming. :) >>>>>> >>>>>>If you want to download the benchmark, a search for "lmbench" should get you to >>>>>>the right place. I'm running version 3.0. I don't know if there is a newer >>>>>>version out. >>>>>> >>>>>>It is very interesting to watch it "dig" out your cache line size, TLB size, >>>>>>etc. And it >>>>>>also reports on cpu latency for specific instructions. IE integer bit >>>>>>instructions take .2ns >>>>>>on my 2.8ghz processor. That is as expected as each int op should buzz thru in >>>>>>1/2 a clock >>>>>>cycle, which is 1/2.8 ns per clock. >>>>>> >>>>>>Have fun, for those that are interested and those that "doubt". >>>>>> >>>>>>:) >>>>> >>>>>I ran the tests Hyatt. Lmbench appears to be wrong. Here is what I have so far. >>>>>I will post more as I do the tests... These are from LMBench >>>>> >>>>>2.4GHz | 220fsb single-channel | CL2.5 | 66.2ns >>>>>2.4GHz | 200fsb single-channel | CL2.5 | 73.0ns >>>>>2.4GHz | 150fsb single-channel | CL2.5 | 102.3ns >>>>>2.4GHz | 133fsb single-channel | CL2.5 | 114.8ns >>>>>2.4GHz | 100fsb single-channel | CL2.0 | 123.7ns >>>>> >>>>>I will be testing dual-channels here in a moment. Also, I do not believe that >>>>>LMBench is accurate. I'll do more testing with Sciencemark 2.0 for Windows. From >>>>>the memory latency tests Matt Taylor and I agree it appears (so far) to be >>>>>accurate. As for the LMBench test results, I'll tar.gz the results directory and >>>>>send them to anyone that wishes to have them... >>>> >>>> >>>>lmbench has _never_ been wrong in the past. It is a very well-known benchmark >>>>with >>>>a couple of journal papers behind it giving the details. 66.2 seems wrong, as I >>>>mentioned, >>>>because the Cray T90 can't break 100ns. And they are _known_ for memory speed >>>>bandwidth and no cache. >>>> >>>>I'm not sure if you are greatly overclocking things or not. If so, maybe that >>>>is what is >>>>making the 66ns time show up. I gave the results for my dual xeon with DDR ram, >>>>and >>>>it certainly is nowhere near that. >>> >>>I'm overclocking, no doubt. At 220MHz fsb (440DDR) it was showing 66.2ns as I >>>mentioned before. Just because I'm overclocking doesn't mean the score isn't >>>right, it just means here in a few years when you get a system with a bus as >>>fast as this one you'll see similar numbers. >> >>What it might mean is that your numbers are not reproducible by others, however. >> We >>have a memory tester here, and the results are really quite remarkable. You can >>plug in >>a SIMM/DIMM and test it and it will show you the fastest clock speed it can run >>at. You >>can plug in another from the _same_ shipment and get a 10-15-20ns difference in >>speed. >> >>No doubt if you worked at a place producing DIMMs, you could screen a set that >>would >>run at truly amazing speeds. >> >>?> Right now you're only sticking to >>>100 & 133fsb (This is what P4's run at), which is very, very slow. Also the >>>nForce2 is clearly supperior to the VIA chipsets. My KT333 at 200fsb is pulling >>>around 94-95ns latency, at 200fsb the nForce2 is 73-74ns. >> >>Note that 100mhz is not slow if you pump four transactions per bus cycle. >>That's not >>half bad and is just as fast as if it were clocked at 400mhz. Ditto for >>133/533. > >Well, with Pentium 4's you won't even see close to your true max theoretical >bandwidth. Intel can say, "400MHz fsb" this and "533MHz fsb" that. You will >NEVER see it. With the Nforce2 you will see within 2 percent of your absolute >max. Lets say you run 200fsb, max is 3.2gb/s, with help from the dual-channel >DDR you'll see about 3.15gb/s. At 220fsb I see 3.5gb/s or so. No P4 can do this. >Especially at such low latencies. > >Here are some results from a friend of mines PC, he's using an Asus A7N8X Deluxe >(nforce2) and an AthlonXP 2800+ (2.25GHz) at 166fsb(333DDR) with nothing >overclocked. He said he was running low to moderate memory timings. I'm sure he >could get below 100ns by switching to CL2 (& fast timings). Also, if he pushed >up to 200-220fsb like I did he'd start seeing 60-80ns times as well. Here you >go: > >"stride=1024 this looks like L1: >0.00049 1.333 >0.00098 1.333 >0.00195 1.333 >0.00293 1.333 >0.00391 1.333 >0.00586 1.333 >0.00781 1.333 >0.00977 1.333 >0.01172 1.333 >0.01367 1.333 >0.01562 1.333 >0.01758 1.333 >0.01953 1.333 >0.02148 1.333 >0.02344 1.333 >0.02539 1.333 >0.02734 1.333 >0.02930 1.333 >0.03125 1.333 >0.03516 1.333 >0.03906 1.333 >0.04297 1.333 >0.04688 1.333 >0.05078 1.333 >0.05469 1.333 >0.05859 1.333 >0.06250 1.333 >0.07031 3.716 This starts to miss L1 and hit L2: >0.07812 5.683 >0.08594 7.458 >0.09375 8.888 >0.10156 8.888 >0.10938 8.888 >0.11719 8.888 >0.12500 8.888 >0.14062 8.888 >0.15625 8.888 >0.17188 8.888 >0.18750 8.888 >0.20312 8.888 >0.21875 8.888 >0.23438 8.888 >0.25000 8.888 >0.28125 9.629 >0.31250 38.464 >0.34375 79.928 and this starts to miss L2 and hit on the raw memory latency: >0.37500 92.567 >0.40625 98.930 >0.43750 101.473 >0.46875 101.193 >0.50000 101.631 >1.00000 101.582 >1.50000 102.518 >2.00000 102.981 >2.50000 103.132 >3.00000 103.280 >3.50000 102.981 >4.00000 103.471 >5.00000 103.584 >6.00000 103.817 >7.00000 103.420 >8.00000 103.815 >10.00000 103.997 >12.00000 103.967 >14.00000 103.715 >16.00000 105.000 >18.00000 104.937 >20.00000 105.740 >30.00000 106.981 > >This is from the LMBench result file.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.