Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: fiddling with the Opteron - Part II

Author: Mark Rawlings

Date: 12:14:40 01/30/04

Go up one level in this thread


I'm problably missing something, but how can using two processors search to a
given depth _more_ than twice as fast as one processor?  i.e. 4:52 to get
through 13 ply for one processor vs. only 1:58 for two processors?  (Maybe there
is an element of luck as to what gets found in the hash?)

Thanks,

Mark



On January 30, 2004 at 14:53:11, Robert Hyatt wrote:

>I thought I would check on the NPS and SMP scaling, just to see how the opteron
>was doing compared to my other Intel boxes.  Recall that my quad xeon scaled NPS
>almost perfectly, but that originally my dual xeon 2.8 did much worse.  However,
>also recall that recent changes have improved the xeon scaling as well.  Here is
>how the Opteron does, which shows it is not having any memory/cache
>bottleneckes.  This is _one_ of the cray blitz positions from the DTS paper.  I
>have run them all, but didn't want to post that much raw data here although I
>can email you 1 run with 1 cpu, four with 2 cpus and four with 4 cpus if you
>want something to look at.
>
>Here's the summary, you can look at the raw data below (this is position 5,
>chosen because I ran each test for 5 minutes, and this position was the first to
>reach depth=13 with 1 processor.
>
>                 1cpu            2cpus            4cpus
>nps             2.17M            4.35M            8.41M
>scale             1.0              2.0              3.9
>
>I did not look at the actual parallel speedup, although the data for this
>one position is given below.  Note that the speedup time has significant
>variance, while the NPS is the better comparison to see how the hardware is
>actually doing.  IE 3.9 above means that 4 processors are not getting in each
>other's way much at all. Note that the 2.0 is as good as it can get, of course.
>  Which simply shows that the opteron is doing well in a parallel sense.
>
>Note that for those in the know, this machine is running in NUMA mode, not SMP
>mode (NUMA sets two contiguous gigabytes of RAM per processor, SMP mode
>interleaves 4k pages so that page 0 is on cpu 0, page 1 on cpu1, etc...  that is
>probably better for non-NUMA-aware programs...  NUMA mode is better if you take
>care to get things into local memory better...
>
>More later.
>
>
>one processor:
>
>solution 1. bxc6
>              time surplus   0.00  time limit 5:00 (5:00)
>              depth   time  score   variation (1)
>                6     0.10     ++   1. ... bxc6!!
>                6     0.14  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                6->   0.15  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                7     0.28     ++   1. ... bxc6!!
>                7->   0.34  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                8     0.80  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>                                    7. Ba6 Qxd4+
>                8->   0.91  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>                                    7. Ba6 Qxd4+
>                9     2.89  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>                9->   3.19  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               10     7.40  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               10->   8.18  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               11    25.29  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>               11->  28.17  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>               12     1:00  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>               12->   2:13  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>               13     4:02  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>               13->   4:52  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>              time=5:00  cpu=99%  mat=-1  n=650939746  fh=94%  nps=2.17M
>              ext-> chk=42151085 cap=1623638 pp=804158 1rep=1876597 mate=63368
>              predicted=0  nodes=650939746  evals=128158996
>              endgame tablebase-> probes=0  hits=0
>              SMP->  split=0  stop=0  data=0/128  cpu=4:59  elap=5:00
>----------------------> solution correct (5/5).
>
>
>two processors:
>solution 1. bxc6
>              time surplus   0.00  time limit 5:00 (5:00)
>              depth   time  score   variation (1)
>                6     0.06     ++   1. ... bxc6!!
>                6     0.08  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                6->   0.08  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                7     0.16     ++   1. ... bxc6!!
>                7->   0.19  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4 (s=3)
>                8     0.45  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>                                    7. Ba6 Qxd4+ (s=2)
>                8->   0.51  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>                                    7. Ba6 Qxd4+
>                9     1.43  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>                9->   1.59  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               10     3.60  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               10->   3.99  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               11    11.35  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>               11->  21.85  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6 (s=2)
>               12    39.37  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>               12->  45.05  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>               13     1:40  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>               13->   1:58  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>              time=5:00  cpu=199%  mat=-1  n=1306020483  fh=94%  nps=4.35M
>              ext-> chk=89727132 cap=3344485 pp=1438540 1rep=4309007 mate=152018
>              predicted=0  nodes=1306020483  evals=251560034
>              endgame tablebase-> probes=0  hits=0
>              SMP->  split=896  stop=116  data=6/128  cpu=9:59  elap=5:00
>
>
>four cpus:
>
>solution 1. bxc6
>              time surplus   0.00  time limit 5:00 (5:00)
>              depth   time  score   variation (1)
>                5->   0.03   0.30   1. ... bxc6 2. Ne4 Nxe4 3. Qxg4+ Kc7
>                                    4. Qf4+ Bd6 5. Qxf7+
>                6     0.04     ++   1. ... bxc6!!
>                6     0.06  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                6->   0.06  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4
>                7     0.10     ++   1. ... bxc6!!
>                7->   0.13  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>                                    7. Bxe4 (s=3)
>                8     0.28  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>                                    7. Ba6 Qxd4+ (s=2)
>                8->   0.34  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>                                    7. Ba6 Qxd4+
>                9     0.89  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>                9->   1.02  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               10     2.16  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               10->   2.45  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>                                    7. Rxe4 cxb5 8. Rxg4
>               11     6.22  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>               11->   7.08  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>               12    15.70  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>               12->  38.28  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>                                    (s=2)
>               13     1:07  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>               13->   1:28  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>                                    (s=2)
>               14     4:20  -0.41   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>                                    7. Qxe4 Qxe4 8. Rxe4 Rc8 9. Bd2 Nc4
>                                    10. Bc3
>              time=5:00  cpu=398%  mat=-1  n=2483899624  fh=93%  nps=8.28M
>              ext-> chk=163866744 cap=6703544 pp=3608824 1rep=7938829
>mate=298937
>              predicted=0  nodes=2483899624  evals=565584863
>              endgame tablebase-> probes=0  hits=0
>              SMP->  split=34290  stop=5494  data=16/128  cpu=19:55  elap=5:00



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.