Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Simple quad-opteron test

Author: Robert Hyatt

Date: 19:55:02 12/04/03

Go up one level in this thread


On December 04, 2003 at 22:08:22, Eugene Nalimov wrote:

>http://www.chess-archive.com/ccc.php?art_id=325912
>
>Crafty on quad Opteron 1.8GHz runs at ~6,850mnps (under Windows, of course).
>
>Thanks,
>Eugene

I know.

:)

A couple of notes.  We are running this quad in NUMA mode, rather than
in SMP mode (where it interleaves memory across all nodes page by page).

However, I have not yet gotten the NUMA library working.  It has the usual
NUMA stuff.  I can bind a processor with a thread, it has an interleaved
malloc() function to equally distribute large things (hash tables, cache
buffers, etc) across memory on all nodes, and so forth.  There is still
more performance, although the memory latency on a 4-way box is not that
bad.  I figure it looks roughly like 60ns local, 120ns for 1 hop, 180ns
for two hops.  Not bad, as the 180ns is close to my dual's 150ns normal
latency.  (this assumes you are not blowing out the TLB of course).

I am continuing to work on the NUMA part...  I think your 6800K was on
the bench command?  I am getting about 6.1M there at present.  I finally
got a gcc to do profile-directed optimizations, but it made _zero_
improvement, unlike the intel compiler's PGO.

more about that as I investigate...

First step is to write a small test harness and see if the PGO stuff
really does things right.  No way to PGO multi-threading, but I can
live without that as that code is barely executed.


>
>On December 03, 2003 at 14:59:03, Robert Hyatt wrote:
>
>>The other day, someone was discussing WAC.  As I have been working on the
>>quad-opteron machine at AMD, I took some time to run WAC three times, one
>>for 1 second per position, one for 5 and one for 10.  The results:
>>
>>===================== 1 seconds per position========================
>>test results summary:
>>
>>total positions searched..........         300
>>number right......................         297
>>number wrong......................           3
>>percentage right..................          99
>>percentage wrong..................           1
>>total nodes searched..............   111851199
>>average search depth..............         4.5
>>nodes per second..................     6072269
>>
>>===================== 5 seconds per position========================
>>test results summary:
>>
>>total positions searched..........         300
>>number right......................         298
>>number wrong......................           2
>>percentage right..................          99
>>percentage wrong..................           0
>>total nodes searched..............   320786849
>>average search depth..............         5.6
>>nodes per second..................     6299702
>>
>>=====================10 seconds per position========================
>>test results summary:
>>
>>total positions searched..........         300
>>number right......................         299
>>number wrong......................           1
>>percentage right..................          99
>>percentage wrong..................           0
>>total nodes searched..............   259379471
>>average search depth..............         4.6
>>nodes per second..................     6369720
>>
>>Benchmark:
>>
>>Crafty v19.7 (4 cpus)
>>
>>White(1): mt=4
>>max threads set to 4
>>White(1): bench
>>Running benchmark. . .
>>......
>>Total nodes: 109241860
>>Raw nodes per second: 6068992
>>Total elapsed time: 18
>>SMP time-to-ply measurement: 35.555556
>>White(1):
>>
>>That now includes the inline FirstOne()/LastOne()/PopCnt() 64 bit code I
>>wrote.  It is about 4-5% faster.  I have not written the attack stuff yet
>>but I suppose I might bite the bullet to see what happens...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.