Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hammer info. And som SMP musings.

Author: Robert Hyatt

Date: 07:32:08 03/24/02

Go up one level in this thread


On March 24, 2002 at 09:49:34, Vincent Diepeveen wrote:

>On March 24, 2002 at 00:00:30, Robert Hyatt wrote:
>
>>On March 23, 2002 at 17:21:10, Slater Wold wrote:
>>
>>>On March 23, 2002 at 17:07:53, Sune Fischer wrote:
>>>
>>>>On March 23, 2002 at 15:58:19, Tom Kerrigan wrote:
>>>>
>>>>>On March 23, 2002 at 09:53:13, Dan Andersson wrote:
>>>>>
>>>>>>As seen in:
>>>>>>http://www.aceshardware.com/read.jsp?id=45000312
>>>>>>A chess program using traditional work scheduling algorithms will not be using
>>>>>>the Hammer architecture at its most effective. But it won't be all that bad due
>>>>>>to the HyperTransport tunnels. And high bandwidth memory. A funny consequence of
>>>>>>the architecture is that SMP multiprocessing is achieved by having software
>>>>>>drivers.
>>>>>
>>>>>I don't know what you mean by "traditional work scheduling algorithms" but the
>>>>>Hammer will be great for running chess programs out of the box. The only way to
>>>>>make it faster would be to recompile the programs for x86-64, which reportedly
>>>>>yields a 10-15% performance gain.
>>>>
>>>>The Hammer is a 64-bit chip, I expect it to bring a lot more than just 10-15% in
>>>>chess, more like 100-150% for those progs with bitboards.
>>>>
>>>>-S.
>>>
>>>You're dreaming.  Alpha's don't get *anywhere* near that kind of gain.  More
>>>like the 10-15% that Tom said.
>>>
>>
>>
>>Depends.  Tim Mann produced > 1M nodes per second on a 600mhz alpha.  NO
>>600 mhz Intel will come within 1/2 that total...
>
>http://www.specbench.org/cgi-bin/osgresults?conf=cint2000
>
>the fastest Alpha:
>
>http://www.specbench.org/osg/cpu2000/results/res2001q4/cpu2000-20011022-01046.html
>
>4 CPUs in total 8 MB L2 cache each cpu, and 1 cpu enabled,
>which means probably that the cpu running crafty benchmark
>was PROFITTING from the other 3 cpu's L2 cache too (classical trick)



This has nothing to do with the test Tim ran.  He had a simple alpha workstation
_on_ _his_ _desk_.  Not an 8 cpu machine.  Just a workstation.


>
>So it was using in total 32MB L2 cache where 1 cpu has 8 MB.


Wrong.  In fact, the alpha Tim used didn't even have 8MB of L3 (not L2)
cache.  On the alpha, L1 and L2 are both small and on the cpu die itself.
L3 is off-chip.



>
>Despite that at 1 Ghz its performance for crafty base runtime is 122.
>
>Note this is a very recent test. November 2001.
>
>Now latest result for K7 processors which are 32 bits:
>
>http://www.specbench.org/osg/cpu2000/results/res2002q1/cpu2000-20020114-01202.html
>
>So this is a single cpu system. No cheating doing test on a quad like
>alpha did (or SUN/IBM keep doing).
>
>Base runtime 102.
>
>So in short alpha 1Ghz with 64 bits registers and
>4 instructions a clock and cheating with L2 cache it
>all results in being 100% x (122/102) (minus 100%) = 19.6% slower
>than a processor clocked single cpu at 1.667Ghz


I'm not worried about _any_ of those numbers.  I am using the real numbers
Tim got when he was running Crafty (and gnuchessx) on ICC last year.  I watched
games, and saw the NPS, and asked him about it.  He sent me a lot of output
.  I just took a quick glance and found he sent me two different sets of output
from two different machines.  The first was a single-cpu 600mhz 21264, which
produced .8M nodes per second.  The other machine was a dual-cpu box (that he
couldn't use as much) which produced a faster result but which also led us to
the "lockless" hashtable to improve performance.

Here is the 21264 single-cpu output (600mhz):

total positions searched..........         300
number right......................         300
number wrong......................           0
percentage right..................         100
percentage wrong..................           0
total nodes searched.............. 236973211.0
average search depth..............         4.5
nodes per second..................      783641


Here is the dual 21264 output (from running wac):

total positions searched..........         300
number right......................         300
number wrong......................           0
percentage right..................         100
percentage wrong..................           0
total nodes searched.............. 330905102.0
average search depth..............         4.5
nodes per second..................     1266767

Now feel free to show me _any_ AMD/Intel cpu at 600mhz that will run anywhere
near that speed.  Or pick _any_ clock frequency you want.  These machines were
simply running 21264's at 600mhz period.  The single-cpu output did not have a
huge L3 cache.

I don't know the specifics about the dual, but it was only 1.5X faster.  We
later improved this a lot as the "lock" facility we used to start with was
very slow on the alpha architecture.




>
>Relative to the 1.67Ghz from the K7 the alpha achieves like
>a 102/122 x 1.667Ghz = 1.394Ghz K7


Show me that 600mhz K7 that can do .8M nodes per second with crafty...

Then I'll be a believer, not now...



>
>In theory the 4 instructions a clock for alpha versus
>3 instructions a clock for K7 give 33% speedup:
>
>1.000 Ghz + 33% = 1.333Ghz
>
>It achieves however 1.394Ghz
>
>In short i am missing the speedup for being 64 bits at all!


Because you are looking at the wrong data.. :)


>
>
>
>
>
>
>
>
>>
>>
>>>>
>>>>>-Tom



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.