Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64 bits

Author: Robert Hyatt

Date: 14:54:36 06/20/02

Go up one level in this thread


On June 20, 2002 at 17:00:28, Eugene Nalimov wrote:

>I am not sure how well your Xeons would scale (1) if they were running at
>1.6GHz, and (2) if they had less L2 cache (I assume thay have at least 1Mb,
>right?).

The 700's have 1mb.  The 550's I have here (10 quads) have 1/2mb.  They scale
similarly.  I can't answer what might happen beyond 700 obviously.  I think
there are 900's out now with a 100mhz FSB that might work in my machine, I
am not sure.  But they are still pricey so I haven't paid much attention.


>
>All "server" CPUs usually have much more cache than desktop ones, and clock much
>slower (higher frequence will not help them as much as desktop ones due to
>"memory wall", due to the extra verification they are several speed grates
>behind the cutting edge desktop CPUs, they must be more reliable --> less heat
>is allowed, etc.).
>
>Eugene
>
>On June 20, 2002 at 15:53:57, Robert Hyatt wrote:
>
>>On June 20, 2002 at 13:45:09, Eugene Nalimov wrote:
>>
>>>I strongly suspect that is caused by the inadequate memory subsystem that is not
>>>scalable enough. I just run 'bench' on Crafty 18.13 on the dual AMD-1600 system
>>>(it's officially called MP-1900+).
>>>
>>>One CPU used:    920knps
>>>Two CPUs used: 1,300knps
>>
>>
>>OK... that is certainly possible.  The only dual I have personally used was
>>a dual PII/300 several years back.  It scaled pretty well, but then 300mhz
>>didn't exactly strain memory.
>>
>>The above numbers you posted really are ugly compared to the 1/2/3/4 cpu
>>numbers on my quad with 4-way interleaving..
>>
>>
>>>
>>>Eugene
>>>
>>>On June 20, 2002 at 11:43:24, Robert Hyatt wrote:
>>>
>>>>On June 20, 2002 at 11:01:12, Brian Richardson wrote:
>>>>
>>>>>On June 19, 2002 at 23:24:23, Robert Hyatt wrote:
>>>>>
>>>>>>On June 19, 2002 at 22:03:07, Brian Richardson wrote:
>>>>>>
>>>>>>>Alpha
>>>>>>>1 cpu  21264/600mhz:
>>>>>>>total positions searched..........         300
>>>>>>>number right......................         300
>>>>>>>number wrong......................           0
>>>>>>>percentage right..................         100
>>>>>>>percentage wrong..................           0
>>>>>>>total nodes searched.............. 236973211.0
>>>>>>>average search depth..............         4.5
>>>>>>>nodes per second..................      783641
>>>>>>>
>>>>>>>2 cpus, 21264/600mhz:
>>>>>>>total positions searched..........         300
>>>>>>>number right......................         300
>>>>>>>number wrong......................           0
>>>>>>>percentage right..................         100
>>>>>>>percentage wrong..................           0
>>>>>>>total nodes searched.............. 330905102.0
>>>>>>>average search depth..............         4.5
>>>>>>>nodes per second..................     1266767
>>>>>>>
>>>>>>>AMD 1900+MP
>>>>>>>max threads set to 2
>>>>>>>hash table memory = 384M bytes.
>>>>>>>pawn hash table memory = 32M bytes.
>>>>>>>pondering disabled.
>>>>>>>Crafty v16.19 (2 cpus)
>>>>>>>test results summary:
>>>>>>>total positions searched.......... 300
>>>>>>>number right...................... 300
>>>>>>>number wrong...................... 0
>>>>>>>percentage right.................. 100
>>>>>>>percentage wrong.................. 0
>>>>>>>total nodes searched.............. 19013488028.0
>>>>>>>average search depth.............. 12.2
>>>>>>>nodes per second.................. 1357144
>>>>>>>(run without test xxx n, st=60)
>>>>>>>
>>>>>>>1 CPU
>>>>>>>total positions searched..........         300
>>>>>>>number right......................         300
>>>>>>>number wrong......................           0
>>>>>>>percentage right..................         100
>>>>>>>percentage wrong..................           0
>>>>>>>total nodes searched..............4639292700.0
>>>>>>>average search depth..............         9.7
>>>>>>>nodes per second..................      960490
>>>>>>>(run with test xxx n=8)
>>>>>>
>>>>>>
>>>>>>I am _totally_ confused now.  The alpha did 800K with 1 cpu, 1200K with
>>>>>>two.  We discovered the "locking" problem and eliminated it, which made
>>>>>>the NPS scale like it should later.  The 2 cpu = 1.5x faster was a clue
>>>>>>in that NPS (for crafty) scales linearly with number of processors, although
>>>>>>search overhead makes some of that NPS wasted.
>>>>>>
>>>>>>For your results, your 1 cpu number is 960K and your two cpu result
>>>>>>is 1300K.  That doesn't look reasonable.  And AMD dual should see the
>>>>>>NPS almost exactly double using two cpus.
>>>>>>
>>>>>>Can you clarify your numbers above or am I mis-reading???
>>>>>
>>>>>I have a hunch about what might be going on.  The Alpha results above show an
>>>>>average search depth of 4.5, which means the test xxx n command (n is stop each
>>>>>test after n plys correct) was probably used with n=2 (per your other email and
>>>>>a test I also ran).  I suspect this runs each test for a much shorter time than
>>>>>the longer runs, which results in significantly lower average nps results for
>>>>>the entire suite, given other overhead.  I also think this is behind the AMD
>>>>>scaling looking relatively poor, since the 2 CPU run was with just st=60 and no
>>>>>"n", which takes 5-6 hours, and the 1 CPU result which was one I tried to do
>>>>>"quickly" last night with n=8 (after observing odd results with an n=2 run).
>>>>>All of this is with version 16.19, which of course does not have the xor
>>>>>lockless hashing.  It is probably not worthwhile going much further, since
>>>>>reproducing Alpha results would be difficult.  My feeling at this point is that
>>>>>AMD today is roughly comparable to older Alphas, but either way I still believe
>>>>>64 bits is the way to go.
>>>>>Brian
>>>>
>>>>
>>>>I just checked the alpha logs.  the default "2" value was used which means
>>>>many searches ended quickly.  That does in fact lower the NPS value
>>>>significantly, due to time quantization errors mainly.  However, for the alpha,
>>>>_both_ runs used the same set-up.  If you run on your AMD, using "2", for
>>>>mt=0 and mt=2, you _ought_ to see the mt=2 NPS roughly 2x the mt=0 NPS, less
>>>>the penalty caused by insufficient memory bandwidth vs L1/L2 cache sizes.
>>>>
>>>>IE here are some numbers for my quad xeon, one test position, 1,2,3 and 4
>>>>processors:  (NPS values only)
>>>>
>>>>1cpu:  377K
>>>>2cpu:  710K
>>>>3cpu: 1037K
>>>>4cpu: 1347K
>>>>
>>>>fairly close to uniform.  Perfect for 2 cpus would be 2*377 of course,
>>>>but the PC can't quite deliver that bandwidth.  Close however.
>>>>
>>>>Optimal would be 754K for 2, 1131K for 3 and 1508K for 4.  Note that this
>>>>is for a box with 4-way interleaving.  A dual won't have that, typically.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.