Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64 bits

Author: Eugene Nalimov

Date: 14:00:28 06/20/02

Go up one level in this thread


I am not sure how well your Xeons would scale (1) if they were running at
1.6GHz, and (2) if they had less L2 cache (I assume thay have at least 1Mb,
right?).

All "server" CPUs usually have much more cache than desktop ones, and clock much
slower (higher frequence will not help them as much as desktop ones due to
"memory wall", due to the extra verification they are several speed grates
behind the cutting edge desktop CPUs, they must be more reliable --> less heat
is allowed, etc.).

Eugene

On June 20, 2002 at 15:53:57, Robert Hyatt wrote:

>On June 20, 2002 at 13:45:09, Eugene Nalimov wrote:
>
>>I strongly suspect that is caused by the inadequate memory subsystem that is not
>>scalable enough. I just run 'bench' on Crafty 18.13 on the dual AMD-1600 system
>>(it's officially called MP-1900+).
>>
>>One CPU used:    920knps
>>Two CPUs used: 1,300knps
>
>
>OK... that is certainly possible.  The only dual I have personally used was
>a dual PII/300 several years back.  It scaled pretty well, but then 300mhz
>didn't exactly strain memory.
>
>The above numbers you posted really are ugly compared to the 1/2/3/4 cpu
>numbers on my quad with 4-way interleaving..
>
>
>>
>>Eugene
>>
>>On June 20, 2002 at 11:43:24, Robert Hyatt wrote:
>>
>>>On June 20, 2002 at 11:01:12, Brian Richardson wrote:
>>>
>>>>On June 19, 2002 at 23:24:23, Robert Hyatt wrote:
>>>>
>>>>>On June 19, 2002 at 22:03:07, Brian Richardson wrote:
>>>>>
>>>>>>Alpha
>>>>>>1 cpu  21264/600mhz:
>>>>>>total positions searched..........         300
>>>>>>number right......................         300
>>>>>>number wrong......................           0
>>>>>>percentage right..................         100
>>>>>>percentage wrong..................           0
>>>>>>total nodes searched.............. 236973211.0
>>>>>>average search depth..............         4.5
>>>>>>nodes per second..................      783641
>>>>>>
>>>>>>2 cpus, 21264/600mhz:
>>>>>>total positions searched..........         300
>>>>>>number right......................         300
>>>>>>number wrong......................           0
>>>>>>percentage right..................         100
>>>>>>percentage wrong..................           0
>>>>>>total nodes searched.............. 330905102.0
>>>>>>average search depth..............         4.5
>>>>>>nodes per second..................     1266767
>>>>>>
>>>>>>AMD 1900+MP
>>>>>>max threads set to 2
>>>>>>hash table memory = 384M bytes.
>>>>>>pawn hash table memory = 32M bytes.
>>>>>>pondering disabled.
>>>>>>Crafty v16.19 (2 cpus)
>>>>>>test results summary:
>>>>>>total positions searched.......... 300
>>>>>>number right...................... 300
>>>>>>number wrong...................... 0
>>>>>>percentage right.................. 100
>>>>>>percentage wrong.................. 0
>>>>>>total nodes searched.............. 19013488028.0
>>>>>>average search depth.............. 12.2
>>>>>>nodes per second.................. 1357144
>>>>>>(run without test xxx n, st=60)
>>>>>>
>>>>>>1 CPU
>>>>>>total positions searched..........         300
>>>>>>number right......................         300
>>>>>>number wrong......................           0
>>>>>>percentage right..................         100
>>>>>>percentage wrong..................           0
>>>>>>total nodes searched..............4639292700.0
>>>>>>average search depth..............         9.7
>>>>>>nodes per second..................      960490
>>>>>>(run with test xxx n=8)
>>>>>
>>>>>
>>>>>I am _totally_ confused now.  The alpha did 800K with 1 cpu, 1200K with
>>>>>two.  We discovered the "locking" problem and eliminated it, which made
>>>>>the NPS scale like it should later.  The 2 cpu = 1.5x faster was a clue
>>>>>in that NPS (for crafty) scales linearly with number of processors, although
>>>>>search overhead makes some of that NPS wasted.
>>>>>
>>>>>For your results, your 1 cpu number is 960K and your two cpu result
>>>>>is 1300K.  That doesn't look reasonable.  And AMD dual should see the
>>>>>NPS almost exactly double using two cpus.
>>>>>
>>>>>Can you clarify your numbers above or am I mis-reading???
>>>>
>>>>I have a hunch about what might be going on.  The Alpha results above show an
>>>>average search depth of 4.5, which means the test xxx n command (n is stop each
>>>>test after n plys correct) was probably used with n=2 (per your other email and
>>>>a test I also ran).  I suspect this runs each test for a much shorter time than
>>>>the longer runs, which results in significantly lower average nps results for
>>>>the entire suite, given other overhead.  I also think this is behind the AMD
>>>>scaling looking relatively poor, since the 2 CPU run was with just st=60 and no
>>>>"n", which takes 5-6 hours, and the 1 CPU result which was one I tried to do
>>>>"quickly" last night with n=8 (after observing odd results with an n=2 run).
>>>>All of this is with version 16.19, which of course does not have the xor
>>>>lockless hashing.  It is probably not worthwhile going much further, since
>>>>reproducing Alpha results would be difficult.  My feeling at this point is that
>>>>AMD today is roughly comparable to older Alphas, but either way I still believe
>>>>64 bits is the way to go.
>>>>Brian
>>>
>>>
>>>I just checked the alpha logs.  the default "2" value was used which means
>>>many searches ended quickly.  That does in fact lower the NPS value
>>>significantly, due to time quantization errors mainly.  However, for the alpha,
>>>_both_ runs used the same set-up.  If you run on your AMD, using "2", for
>>>mt=0 and mt=2, you _ought_ to see the mt=2 NPS roughly 2x the mt=0 NPS, less
>>>the penalty caused by insufficient memory bandwidth vs L1/L2 cache sizes.
>>>
>>>IE here are some numbers for my quad xeon, one test position, 1,2,3 and 4
>>>processors:  (NPS values only)
>>>
>>>1cpu:  377K
>>>2cpu:  710K
>>>3cpu: 1037K
>>>4cpu: 1347K
>>>
>>>fairly close to uniform.  Perfect for 2 cpus would be 2*377 of course,
>>>but the PC can't quite deliver that bandwidth.  Close however.
>>>
>>>Optimal would be 754K for 2, 1131K for 3 and 1508K for 4.  Note that this
>>>is for a box with 4-way interleaving.  A dual won't have that, typically.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.