Author: Robert Hyatt
Date: 13:08:50 11/26/03
Go up one level in this thread
On November 26, 2003 at 15:36:58, Dann Corbit wrote: >On November 26, 2003 at 15:25:10, Robert Hyatt wrote: > >>I have been working both with Eugene and AMD. The following bench run is >>on a quad 1.8ghz opteron, 8 gigs of ram. The only "option" I have set is >>"mt=4". There is _no_ assembly code in this version, pure C only. I am >>looking at updating the asm to 64 bit but that will take some time and >>studying. >> >>Meanwhile: >> >>Crafty v19.6 (1 cpus) >> >>White(1): mt=4 >>max threads set to 4 >>White(1): bench >>Running benchmark. . . >>...... >>Total nodes: 105863114 >>Raw nodes per second: 5881284 >>Total elapsed time: 18 >>SMP time-to-ply measurement: 35.555556 >> >>This is using gcc, although I am not sure whether it is producing 64 bit >>or 32 bit code at the moment. However, 5.8M nps is not bad. About 1M less >>than Eugene's MSVC numbers. I will look into the 64 bit stuff more to see if >>gcc is producing real opteron assembly or not... And I will study the >>PGO options although the list time I tried them on GCC the compiler promptly >>crashed. :) >> >>Note that the above is with default hash and everything, no endgame tables, >>no opening book, etc... > >Could we see the numbers for 1,2,3 threads active also? >I would be interested to see how it scales. Sure. one processor: White(1): bench Running benchmark. . . ...... Total nodes: 100409437 Raw nodes per second: 1498648 Total elapsed time: 67 SMP time-to-ply measurement: 9.552239 two processors: max threads set to 2 White(1): bench Running benchmark. . . ...... Total nodes: 99562452 Raw nodes per second: 3017044 Total elapsed time: 33 SMP time-to-ply measurement: 19.393939 three processors: max threads set to 3 White(1): bench Running benchmark. . . ...... Total nodes: 102543114 Raw nodes per second: 4458396 Total elapsed time: 23 SMP time-to-ply measurement: 27.826087 four processors: max threads set to 4 White(1): bench Running benchmark. . . ...... Total nodes: 102606915 Raw nodes per second: 5700384 Total elapsed time: 18 SMP time-to-ply measurement: 35.555556 Let me note here that this is not a very NUMA-aware implementation, nowhere near as good as what we did (Eugene and I) for windows. I am going to look at the Linux NUMA library tonight and work on getting some of those features in, which should further push performance up. This is way better than 19.4, but it is not "all there" yet. Note also that there is no assembly language of any kind in this version, it is pure C. I plan on rectifying that _soon_. :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.