Author: Robert Hyatt
Date: 19:55:02 12/04/03
Go up one level in this thread
On December 04, 2003 at 22:08:22, Eugene Nalimov wrote: >http://www.chess-archive.com/ccc.php?art_id=325912 > >Crafty on quad Opteron 1.8GHz runs at ~6,850mnps (under Windows, of course). > >Thanks, >Eugene I know. :) A couple of notes. We are running this quad in NUMA mode, rather than in SMP mode (where it interleaves memory across all nodes page by page). However, I have not yet gotten the NUMA library working. It has the usual NUMA stuff. I can bind a processor with a thread, it has an interleaved malloc() function to equally distribute large things (hash tables, cache buffers, etc) across memory on all nodes, and so forth. There is still more performance, although the memory latency on a 4-way box is not that bad. I figure it looks roughly like 60ns local, 120ns for 1 hop, 180ns for two hops. Not bad, as the 180ns is close to my dual's 150ns normal latency. (this assumes you are not blowing out the TLB of course). I am continuing to work on the NUMA part... I think your 6800K was on the bench command? I am getting about 6.1M there at present. I finally got a gcc to do profile-directed optimizations, but it made _zero_ improvement, unlike the intel compiler's PGO. more about that as I investigate... First step is to write a small test harness and see if the PGO stuff really does things right. No way to PGO multi-threading, but I can live without that as that code is barely executed. > >On December 03, 2003 at 14:59:03, Robert Hyatt wrote: > >>The other day, someone was discussing WAC. As I have been working on the >>quad-opteron machine at AMD, I took some time to run WAC three times, one >>for 1 second per position, one for 5 and one for 10. The results: >> >>===================== 1 seconds per position======================== >>test results summary: >> >>total positions searched.......... 300 >>number right...................... 297 >>number wrong...................... 3 >>percentage right.................. 99 >>percentage wrong.................. 1 >>total nodes searched.............. 111851199 >>average search depth.............. 4.5 >>nodes per second.................. 6072269 >> >>===================== 5 seconds per position======================== >>test results summary: >> >>total positions searched.......... 300 >>number right...................... 298 >>number wrong...................... 2 >>percentage right.................. 99 >>percentage wrong.................. 0 >>total nodes searched.............. 320786849 >>average search depth.............. 5.6 >>nodes per second.................. 6299702 >> >>=====================10 seconds per position======================== >>test results summary: >> >>total positions searched.......... 300 >>number right...................... 299 >>number wrong...................... 1 >>percentage right.................. 99 >>percentage wrong.................. 0 >>total nodes searched.............. 259379471 >>average search depth.............. 4.6 >>nodes per second.................. 6369720 >> >>Benchmark: >> >>Crafty v19.7 (4 cpus) >> >>White(1): mt=4 >>max threads set to 4 >>White(1): bench >>Running benchmark. . . >>...... >>Total nodes: 109241860 >>Raw nodes per second: 6068992 >>Total elapsed time: 18 >>SMP time-to-ply measurement: 35.555556 >>White(1): >> >>That now includes the inline FirstOne()/LastOne()/PopCnt() 64 bit code I >>wrote. It is about 4-5% faster. I have not written the attack stuff yet >>but I suppose I might bite the bullet to see what happens...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.