Author: Tom Likens
Date: 14:48:34 11/26/03
Go up one level in this thread
On November 26, 2003 at 16:08:50, Robert Hyatt wrote: >On November 26, 2003 at 15:36:58, Dann Corbit wrote: > >>On November 26, 2003 at 15:25:10, Robert Hyatt wrote: >> >>>I have been working both with Eugene and AMD. The following bench run is >>>on a quad 1.8ghz opteron, 8 gigs of ram. The only "option" I have set is >>>"mt=4". There is _no_ assembly code in this version, pure C only. I am >>>looking at updating the asm to 64 bit but that will take some time and >>>studying. >>> >>>Meanwhile: >>> >>>Crafty v19.6 (1 cpus) >>> >>>White(1): mt=4 >>>max threads set to 4 >>>White(1): bench >>>Running benchmark. . . >>>...... >>>Total nodes: 105863114 >>>Raw nodes per second: 5881284 >>>Total elapsed time: 18 >>>SMP time-to-ply measurement: 35.555556 >>> >>>This is using gcc, although I am not sure whether it is producing 64 bit >>>or 32 bit code at the moment. However, 5.8M nps is not bad. About 1M less >>>than Eugene's MSVC numbers. I will look into the 64 bit stuff more to see if >>>gcc is producing real opteron assembly or not... And I will study the >>>PGO options although the list time I tried them on GCC the compiler promptly >>>crashed. :) >>> >>>Note that the above is with default hash and everything, no endgame tables, >>>no opening book, etc... >> >>Could we see the numbers for 1,2,3 threads active also? >>I would be interested to see how it scales. > > >Sure. > >one processor: > >White(1): bench >Running benchmark. . . >...... >Total nodes: 100409437 >Raw nodes per second: 1498648 >Total elapsed time: 67 >SMP time-to-ply measurement: 9.552239 > > >two processors: > >max threads set to 2 >White(1): bench >Running benchmark. . . >...... >Total nodes: 99562452 >Raw nodes per second: 3017044 >Total elapsed time: 33 >SMP time-to-ply measurement: 19.393939 > >three processors: > >max threads set to 3 >White(1): bench >Running benchmark. . . >...... >Total nodes: 102543114 >Raw nodes per second: 4458396 >Total elapsed time: 23 >SMP time-to-ply measurement: 27.826087 > >four processors: > >max threads set to 4 >White(1): bench >Running benchmark. . . >...... >Total nodes: 102606915 >Raw nodes per second: 5700384 >Total elapsed time: 18 >SMP time-to-ply measurement: 35.555556 > > >Let me note here that this is not a very NUMA-aware implementation, >nowhere near as good as what we did (Eugene and I) for windows. I >am going to look at the Linux NUMA library tonight and work on getting >some of those features in, which should further push performance up. > >This is way better than 19.4, but it is not "all there" yet. Note also >that there is no assembly language of any kind in this version, it is pure >C. I plan on rectifying that _soon_. :) Sometimes Bob, you not only lift the bar for the rest of us, but you put it in orbit ;-) I was thinking about entering CCT6 but I'm not sure there's much point! BTW, I finally received the AMD FX-51 and my preliminary tests under Windows XP Pro (I'm loading Linux as I type) gives Djinn a (roughly) 4x speedup. Unlike your test, I am including 32-bit inline assembly, but no 64-bit assembly which should boost things nicely. I also intend to use profile guided optimizations after I get Linux set up to see how that improves things (hopefully, quite a bit since the Windows version was compiled specifically for a P4 system). One caveat, so far I haven't been able to get the 64-bit version of SUSE 9.0 to recognize my SATA hard-drives or my Promise controller. The 32-bit version *does* recognize the components so that's what I'm loading to get Linux on it initially. It's not too bad since I have my home directories mounted on another machine via NFS, and intend to load the 64-bit version of the OS when it works. More info as it becomes available. regards, --tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.