Author: Robert Hyatt
Date: 14:54:36 06/20/02
Go up one level in this thread
On June 20, 2002 at 17:00:28, Eugene Nalimov wrote: >I am not sure how well your Xeons would scale (1) if they were running at >1.6GHz, and (2) if they had less L2 cache (I assume thay have at least 1Mb, >right?). The 700's have 1mb. The 550's I have here (10 quads) have 1/2mb. They scale similarly. I can't answer what might happen beyond 700 obviously. I think there are 900's out now with a 100mhz FSB that might work in my machine, I am not sure. But they are still pricey so I haven't paid much attention. > >All "server" CPUs usually have much more cache than desktop ones, and clock much >slower (higher frequence will not help them as much as desktop ones due to >"memory wall", due to the extra verification they are several speed grates >behind the cutting edge desktop CPUs, they must be more reliable --> less heat >is allowed, etc.). > >Eugene > >On June 20, 2002 at 15:53:57, Robert Hyatt wrote: > >>On June 20, 2002 at 13:45:09, Eugene Nalimov wrote: >> >>>I strongly suspect that is caused by the inadequate memory subsystem that is not >>>scalable enough. I just run 'bench' on Crafty 18.13 on the dual AMD-1600 system >>>(it's officially called MP-1900+). >>> >>>One CPU used: 920knps >>>Two CPUs used: 1,300knps >> >> >>OK... that is certainly possible. The only dual I have personally used was >>a dual PII/300 several years back. It scaled pretty well, but then 300mhz >>didn't exactly strain memory. >> >>The above numbers you posted really are ugly compared to the 1/2/3/4 cpu >>numbers on my quad with 4-way interleaving.. >> >> >>> >>>Eugene >>> >>>On June 20, 2002 at 11:43:24, Robert Hyatt wrote: >>> >>>>On June 20, 2002 at 11:01:12, Brian Richardson wrote: >>>> >>>>>On June 19, 2002 at 23:24:23, Robert Hyatt wrote: >>>>> >>>>>>On June 19, 2002 at 22:03:07, Brian Richardson wrote: >>>>>> >>>>>>>Alpha >>>>>>>1 cpu 21264/600mhz: >>>>>>>total positions searched.......... 300 >>>>>>>number right...................... 300 >>>>>>>number wrong...................... 0 >>>>>>>percentage right.................. 100 >>>>>>>percentage wrong.................. 0 >>>>>>>total nodes searched.............. 236973211.0 >>>>>>>average search depth.............. 4.5 >>>>>>>nodes per second.................. 783641 >>>>>>> >>>>>>>2 cpus, 21264/600mhz: >>>>>>>total positions searched.......... 300 >>>>>>>number right...................... 300 >>>>>>>number wrong...................... 0 >>>>>>>percentage right.................. 100 >>>>>>>percentage wrong.................. 0 >>>>>>>total nodes searched.............. 330905102.0 >>>>>>>average search depth.............. 4.5 >>>>>>>nodes per second.................. 1266767 >>>>>>> >>>>>>>AMD 1900+MP >>>>>>>max threads set to 2 >>>>>>>hash table memory = 384M bytes. >>>>>>>pawn hash table memory = 32M bytes. >>>>>>>pondering disabled. >>>>>>>Crafty v16.19 (2 cpus) >>>>>>>test results summary: >>>>>>>total positions searched.......... 300 >>>>>>>number right...................... 300 >>>>>>>number wrong...................... 0 >>>>>>>percentage right.................. 100 >>>>>>>percentage wrong.................. 0 >>>>>>>total nodes searched.............. 19013488028.0 >>>>>>>average search depth.............. 12.2 >>>>>>>nodes per second.................. 1357144 >>>>>>>(run without test xxx n, st=60) >>>>>>> >>>>>>>1 CPU >>>>>>>total positions searched.......... 300 >>>>>>>number right...................... 300 >>>>>>>number wrong...................... 0 >>>>>>>percentage right.................. 100 >>>>>>>percentage wrong.................. 0 >>>>>>>total nodes searched..............4639292700.0 >>>>>>>average search depth.............. 9.7 >>>>>>>nodes per second.................. 960490 >>>>>>>(run with test xxx n=8) >>>>>> >>>>>> >>>>>>I am _totally_ confused now. The alpha did 800K with 1 cpu, 1200K with >>>>>>two. We discovered the "locking" problem and eliminated it, which made >>>>>>the NPS scale like it should later. The 2 cpu = 1.5x faster was a clue >>>>>>in that NPS (for crafty) scales linearly with number of processors, although >>>>>>search overhead makes some of that NPS wasted. >>>>>> >>>>>>For your results, your 1 cpu number is 960K and your two cpu result >>>>>>is 1300K. That doesn't look reasonable. And AMD dual should see the >>>>>>NPS almost exactly double using two cpus. >>>>>> >>>>>>Can you clarify your numbers above or am I mis-reading??? >>>>> >>>>>I have a hunch about what might be going on. The Alpha results above show an >>>>>average search depth of 4.5, which means the test xxx n command (n is stop each >>>>>test after n plys correct) was probably used with n=2 (per your other email and >>>>>a test I also ran). I suspect this runs each test for a much shorter time than >>>>>the longer runs, which results in significantly lower average nps results for >>>>>the entire suite, given other overhead. I also think this is behind the AMD >>>>>scaling looking relatively poor, since the 2 CPU run was with just st=60 and no >>>>>"n", which takes 5-6 hours, and the 1 CPU result which was one I tried to do >>>>>"quickly" last night with n=8 (after observing odd results with an n=2 run). >>>>>All of this is with version 16.19, which of course does not have the xor >>>>>lockless hashing. It is probably not worthwhile going much further, since >>>>>reproducing Alpha results would be difficult. My feeling at this point is that >>>>>AMD today is roughly comparable to older Alphas, but either way I still believe >>>>>64 bits is the way to go. >>>>>Brian >>>> >>>> >>>>I just checked the alpha logs. the default "2" value was used which means >>>>many searches ended quickly. That does in fact lower the NPS value >>>>significantly, due to time quantization errors mainly. However, for the alpha, >>>>_both_ runs used the same set-up. If you run on your AMD, using "2", for >>>>mt=0 and mt=2, you _ought_ to see the mt=2 NPS roughly 2x the mt=0 NPS, less >>>>the penalty caused by insufficient memory bandwidth vs L1/L2 cache sizes. >>>> >>>>IE here are some numbers for my quad xeon, one test position, 1,2,3 and 4 >>>>processors: (NPS values only) >>>> >>>>1cpu: 377K >>>>2cpu: 710K >>>>3cpu: 1037K >>>>4cpu: 1347K >>>> >>>>fairly close to uniform. Perfect for 2 cpus would be 2*377 of course, >>>>but the PC can't quite deliver that bandwidth. Close however. >>>> >>>>Optimal would be 754K for 2, 1131K for 3 and 1508K for 4. Note that this >>>>is for a box with 4-way interleaving. A dual won't have that, typically.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.