Author: Eugene Nalimov
Date: 18:10:57 03/24/02
Go up one level in this thread
On March 24, 2002 at 10:32:08, Robert Hyatt wrote: >On March 24, 2002 at 09:49:34, Vincent Diepeveen wrote: > >>On March 24, 2002 at 00:00:30, Robert Hyatt wrote: >> >>>On March 23, 2002 at 17:21:10, Slater Wold wrote: >>> >>>>On March 23, 2002 at 17:07:53, Sune Fischer wrote: >>>> >>>>>On March 23, 2002 at 15:58:19, Tom Kerrigan wrote: >>>>> >>>>>>On March 23, 2002 at 09:53:13, Dan Andersson wrote: >>>>>> >>>>>>>As seen in: >>>>>>>http://www.aceshardware.com/read.jsp?id=45000312 >>>>>>>A chess program using traditional work scheduling algorithms will not be using >>>>>>>the Hammer architecture at its most effective. But it won't be all that bad due >>>>>>>to the HyperTransport tunnels. And high bandwidth memory. A funny consequence of >>>>>>>the architecture is that SMP multiprocessing is achieved by having software >>>>>>>drivers. >>>>>> >>>>>>I don't know what you mean by "traditional work scheduling algorithms" but the >>>>>>Hammer will be great for running chess programs out of the box. The only way to >>>>>>make it faster would be to recompile the programs for x86-64, which reportedly >>>>>>yields a 10-15% performance gain. >>>>> >>>>>The Hammer is a 64-bit chip, I expect it to bring a lot more than just 10-15% in >>>>>chess, more like 100-150% for those progs with bitboards. >>>>> >>>>>-S. >>>> >>>>You're dreaming. Alpha's don't get *anywhere* near that kind of gain. More >>>>like the 10-15% that Tom said. >>>> >>> >>> >>>Depends. Tim Mann produced > 1M nodes per second on a 600mhz alpha. NO >>>600 mhz Intel will come within 1/2 that total... >> >>http://www.specbench.org/cgi-bin/osgresults?conf=cint2000 >> >>the fastest Alpha: >> >>http://www.specbench.org/osg/cpu2000/results/res2001q4/cpu2000-20011022-01046.html >> >>4 CPUs in total 8 MB L2 cache each cpu, and 1 cpu enabled, >>which means probably that the cpu running crafty benchmark >>was PROFITTING from the other 3 cpu's L2 cache too (classical trick) > > > >This has nothing to do with the test Tim ran. He had a simple alpha workstation >_on_ _his_ _desk_. Not an 8 cpu machine. Just a workstation. > > >> >>So it was using in total 32MB L2 cache where 1 cpu has 8 MB. > > >Wrong. In fact, the alpha Tim used didn't even have 8MB of L3 (not L2) >cache. On the alpha, L1 and L2 are both small and on the cpu die itself. >L3 is off-chip. > > > >> >>Despite that at 1 Ghz its performance for crafty base runtime is 122. >> >>Note this is a very recent test. November 2001. >> >>Now latest result for K7 processors which are 32 bits: >> >>http://www.specbench.org/osg/cpu2000/results/res2002q1/cpu2000-20020114-01202.html >> >>So this is a single cpu system. No cheating doing test on a quad like >>alpha did (or SUN/IBM keep doing). >> >>Base runtime 102. >> >>So in short alpha 1Ghz with 64 bits registers and >>4 instructions a clock and cheating with L2 cache it >>all results in being 100% x (122/102) (minus 100%) = 19.6% slower >>than a processor clocked single cpu at 1.667Ghz > > >I'm not worried about _any_ of those numbers. I am using the real numbers >Tim got when he was running Crafty (and gnuchessx) on ICC last year. I watched >games, and saw the NPS, and asked him about it. He sent me a lot of output >. I just took a quick glance and found he sent me two different sets of output >from two different machines. The first was a single-cpu 600mhz 21264, which >produced .8M nodes per second. The other machine was a dual-cpu box (that he >couldn't use as much) which produced a faster result but which also led us to >the "lockless" hashtable to improve performance. > >Here is the 21264 single-cpu output (600mhz): > >total positions searched.......... 300 >number right...................... 300 >number wrong...................... 0 >percentage right.................. 100 >percentage wrong.................. 0 >total nodes searched.............. 236973211.0 >average search depth.............. 4.5 >nodes per second.................. 783641 > > >Here is the dual 21264 output (from running wac): > >total positions searched.......... 300 >number right...................... 300 >number wrong...................... 0 >percentage right.................. 100 >percentage wrong.................. 0 >total nodes searched.............. 330905102.0 >average search depth.............. 4.5 >nodes per second.................. 1266767 > >Now feel free to show me _any_ AMD/Intel cpu at 600mhz that will run anywhere >near that speed. Or pick _any_ clock frequency you want. These machines were >simply running 21264's at 600mhz period. The single-cpu output did not have a >huge L3 cache. We are talking about *absolute* performance, not about "nodes per MHz", right? Right now I can buy 2.2GHz x86 system, but I believe that fastest Alpha is 1GHz. I just run Crafty's "bench" on my 1.13Hz notebook and got 620knps, and I think that fastest x86 system will be faster than fastest Alpha. Eugene >I don't know the specifics about the dual, but it was only 1.5X faster. We >later improved this a lot as the "lock" facility we used to start with was >very slow on the alpha architecture. > > > > >> >>Relative to the 1.67Ghz from the K7 the alpha achieves like >>a 102/122 x 1.667Ghz = 1.394Ghz K7 > > >Show me that 600mhz K7 that can do .8M nodes per second with crafty... > >Then I'll be a believer, not now... > > > >> >>In theory the 4 instructions a clock for alpha versus >>3 instructions a clock for K7 give 33% speedup: >> >>1.000 Ghz + 33% = 1.333Ghz >> >>It achieves however 1.394Ghz >> >>In short i am missing the speedup for being 64 bits at all! > > >Because you are looking at the wrong data.. :) > > >> >> >> >> >> >> >> >> >>> >>> >>>>> >>>>>>-Tom
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.