Author: Robert Hyatt
Date: 07:32:08 03/24/02
Go up one level in this thread
On March 24, 2002 at 09:49:34, Vincent Diepeveen wrote: >On March 24, 2002 at 00:00:30, Robert Hyatt wrote: > >>On March 23, 2002 at 17:21:10, Slater Wold wrote: >> >>>On March 23, 2002 at 17:07:53, Sune Fischer wrote: >>> >>>>On March 23, 2002 at 15:58:19, Tom Kerrigan wrote: >>>> >>>>>On March 23, 2002 at 09:53:13, Dan Andersson wrote: >>>>> >>>>>>As seen in: >>>>>>http://www.aceshardware.com/read.jsp?id=45000312 >>>>>>A chess program using traditional work scheduling algorithms will not be using >>>>>>the Hammer architecture at its most effective. But it won't be all that bad due >>>>>>to the HyperTransport tunnels. And high bandwidth memory. A funny consequence of >>>>>>the architecture is that SMP multiprocessing is achieved by having software >>>>>>drivers. >>>>> >>>>>I don't know what you mean by "traditional work scheduling algorithms" but the >>>>>Hammer will be great for running chess programs out of the box. The only way to >>>>>make it faster would be to recompile the programs for x86-64, which reportedly >>>>>yields a 10-15% performance gain. >>>> >>>>The Hammer is a 64-bit chip, I expect it to bring a lot more than just 10-15% in >>>>chess, more like 100-150% for those progs with bitboards. >>>> >>>>-S. >>> >>>You're dreaming. Alpha's don't get *anywhere* near that kind of gain. More >>>like the 10-15% that Tom said. >>> >> >> >>Depends. Tim Mann produced > 1M nodes per second on a 600mhz alpha. NO >>600 mhz Intel will come within 1/2 that total... > >http://www.specbench.org/cgi-bin/osgresults?conf=cint2000 > >the fastest Alpha: > >http://www.specbench.org/osg/cpu2000/results/res2001q4/cpu2000-20011022-01046.html > >4 CPUs in total 8 MB L2 cache each cpu, and 1 cpu enabled, >which means probably that the cpu running crafty benchmark >was PROFITTING from the other 3 cpu's L2 cache too (classical trick) This has nothing to do with the test Tim ran. He had a simple alpha workstation _on_ _his_ _desk_. Not an 8 cpu machine. Just a workstation. > >So it was using in total 32MB L2 cache where 1 cpu has 8 MB. Wrong. In fact, the alpha Tim used didn't even have 8MB of L3 (not L2) cache. On the alpha, L1 and L2 are both small and on the cpu die itself. L3 is off-chip. > >Despite that at 1 Ghz its performance for crafty base runtime is 122. > >Note this is a very recent test. November 2001. > >Now latest result for K7 processors which are 32 bits: > >http://www.specbench.org/osg/cpu2000/results/res2002q1/cpu2000-20020114-01202.html > >So this is a single cpu system. No cheating doing test on a quad like >alpha did (or SUN/IBM keep doing). > >Base runtime 102. > >So in short alpha 1Ghz with 64 bits registers and >4 instructions a clock and cheating with L2 cache it >all results in being 100% x (122/102) (minus 100%) = 19.6% slower >than a processor clocked single cpu at 1.667Ghz I'm not worried about _any_ of those numbers. I am using the real numbers Tim got when he was running Crafty (and gnuchessx) on ICC last year. I watched games, and saw the NPS, and asked him about it. He sent me a lot of output . I just took a quick glance and found he sent me two different sets of output from two different machines. The first was a single-cpu 600mhz 21264, which produced .8M nodes per second. The other machine was a dual-cpu box (that he couldn't use as much) which produced a faster result but which also led us to the "lockless" hashtable to improve performance. Here is the 21264 single-cpu output (600mhz): total positions searched.......... 300 number right...................... 300 number wrong...................... 0 percentage right.................. 100 percentage wrong.................. 0 total nodes searched.............. 236973211.0 average search depth.............. 4.5 nodes per second.................. 783641 Here is the dual 21264 output (from running wac): total positions searched.......... 300 number right...................... 300 number wrong...................... 0 percentage right.................. 100 percentage wrong.................. 0 total nodes searched.............. 330905102.0 average search depth.............. 4.5 nodes per second.................. 1266767 Now feel free to show me _any_ AMD/Intel cpu at 600mhz that will run anywhere near that speed. Or pick _any_ clock frequency you want. These machines were simply running 21264's at 600mhz period. The single-cpu output did not have a huge L3 cache. I don't know the specifics about the dual, but it was only 1.5X faster. We later improved this a lot as the "lock" facility we used to start with was very slow on the alpha architecture. > >Relative to the 1.67Ghz from the K7 the alpha achieves like >a 102/122 x 1.667Ghz = 1.394Ghz K7 Show me that 600mhz K7 that can do .8M nodes per second with crafty... Then I'll be a believer, not now... > >In theory the 4 instructions a clock for alpha versus >3 instructions a clock for K7 give 33% speedup: > >1.000 Ghz + 33% = 1.333Ghz > >It achieves however 1.394Ghz > >In short i am missing the speedup for being 64 bits at all! Because you are looking at the wrong data.. :) > > > > > > > > >> >> >>>> >>>>>-Tom
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.