Computer Chess Club Archives


Search

Terms

Messages

Subject: fiddling with the Opteron - Part II

Author: Robert Hyatt

Date: 11:53:11 01/30/04


I thought I would check on the NPS and SMP scaling, just to see how the opteron
was doing compared to my other Intel boxes.  Recall that my quad xeon scaled NPS
almost perfectly, but that originally my dual xeon 2.8 did much worse.  However,
also recall that recent changes have improved the xeon scaling as well.  Here is
how the Opteron does, which shows it is not having any memory/cache
bottleneckes.  This is _one_ of the cray blitz positions from the DTS paper.  I
have run them all, but didn't want to post that much raw data here although I
can email you 1 run with 1 cpu, four with 2 cpus and four with 4 cpus if you
want something to look at.

Here's the summary, you can look at the raw data below (this is position 5,
chosen because I ran each test for 5 minutes, and this position was the first to
reach depth=13 with 1 processor.

                 1cpu            2cpus            4cpus
nps             2.17M            4.35M            8.41M
scale             1.0              2.0              3.9

I did not look at the actual parallel speedup, although the data for this
one position is given below.  Note that the speedup time has significant
variance, while the NPS is the better comparison to see how the hardware is
actually doing.  IE 3.9 above means that 4 processors are not getting in each
other's way much at all. Note that the 2.0 is as good as it can get, of course.
  Which simply shows that the opteron is doing well in a parallel sense.

Note that for those in the know, this machine is running in NUMA mode, not SMP
mode (NUMA sets two contiguous gigabytes of RAM per processor, SMP mode
interleaves 4k pages so that page 0 is on cpu 0, page 1 on cpu1, etc...  that is
probably better for non-NUMA-aware programs...  NUMA mode is better if you take
care to get things into local memory better...

More later.


one processor:

solution 1. bxc6
              time surplus   0.00  time limit 5:00 (5:00)
              depth   time  score   variation (1)
                6     0.10     ++   1. ... bxc6!!
                6     0.14  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                6->   0.15  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                7     0.28     ++   1. ... bxc6!!
                7->   0.34  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                8     0.80  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
                                    7. Ba6 Qxd4+
                8->   0.91  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
                                    7. Ba6 Qxd4+
                9     2.89  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
                9->   3.19  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               10     7.40  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               10->   8.18  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               11    25.29  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
               11->  28.17  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
               12     1:00  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
               12->   2:13  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
               13     4:02  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
               13->   4:52  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
              time=5:00  cpu=99%  mat=-1  n=650939746  fh=94%  nps=2.17M
              ext-> chk=42151085 cap=1623638 pp=804158 1rep=1876597 mate=63368
              predicted=0  nodes=650939746  evals=128158996
              endgame tablebase-> probes=0  hits=0
              SMP->  split=0  stop=0  data=0/128  cpu=4:59  elap=5:00
----------------------> solution correct (5/5).


two processors:
solution 1. bxc6
              time surplus   0.00  time limit 5:00 (5:00)
              depth   time  score   variation (1)
                6     0.06     ++   1. ... bxc6!!
                6     0.08  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                6->   0.08  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                7     0.16     ++   1. ... bxc6!!
                7->   0.19  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4 (s=3)
                8     0.45  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
                                    7. Ba6 Qxd4+ (s=2)
                8->   0.51  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
                                    7. Ba6 Qxd4+
                9     1.43  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
                9->   1.59  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               10     3.60  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               10->   3.99  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               11    11.35  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
               11->  21.85  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Bd6 (s=2)
               12    39.37  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
               12->  45.05  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
               13     1:40  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
               13->   1:58  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
              time=5:00  cpu=199%  mat=-1  n=1306020483  fh=94%  nps=4.35M
              ext-> chk=89727132 cap=3344485 pp=1438540 1rep=4309007 mate=152018
              predicted=0  nodes=1306020483  evals=251560034
              endgame tablebase-> probes=0  hits=0
              SMP->  split=896  stop=116  data=6/128  cpu=9:59  elap=5:00


four cpus:

solution 1. bxc6
              time surplus   0.00  time limit 5:00 (5:00)
              depth   time  score   variation (1)
                5->   0.03   0.30   1. ... bxc6 2. Ne4 Nxe4 3. Qxg4+ Kc7
                                    4. Qf4+ Bd6 5. Qxf7+
                6     0.04     ++   1. ... bxc6!!
                6     0.06  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                6->   0.06  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4
                7     0.10     ++   1. ... bxc6!!
                7->   0.13  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
                                    7. Bxe4 (s=3)
                8     0.28  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
                                    7. Ba6 Qxd4+ (s=2)
                8->   0.34  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
                                    7. Ba6 Qxd4+
                9     0.89  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
                9->   1.02  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               10     2.16  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               10->   2.45  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
                                    7. Rxe4 cxb5 8. Rxg4
               11     6.22  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
               11->   7.08  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
               12    15.70  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
               12->  38.28  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
                                    (s=2)
               13     1:07  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
               13->   1:28  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
                                    (s=2)
               14     4:20  -0.41   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
                                    7. Qxe4 Qxe4 8. Rxe4 Rc8 9. Bd2 Nc4
                                    10. Bc3
              time=5:00  cpu=398%  mat=-1  n=2483899624  fh=93%  nps=8.28M
              ext-> chk=163866744 cap=6703544 pp=3608824 1rep=7938829
mate=298937
              predicted=0  nodes=2483899624  evals=565584863
              endgame tablebase-> probes=0  hits=0
              SMP->  split=34290  stop=5494  data=16/128  cpu=19:55  elap=5:00



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.