Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: fiddling with the Opteron - Part II

Author: Robert Hyatt

Date: 13:59:00 01/30/04

Go up one level in this thread


On January 30, 2004 at 15:14:40, Mark Rawlings wrote:

>I'm problably missing something, but how can using two processors search to a
>given depth _more_ than twice as fast as one processor?  i.e. 4:52 to get
>through 13 ply for one processor vs. only 1:58 for two processors?  (Maybe there
>is an element of luck as to what gets found in the hash?)

Several reasons.  Hash is one.  Bad move ordering at the root is another, as
that might let the second cpu search the right move before the other one has
been finished...



>
>Thanks,
>
>Mark
>
>
>
>On January 30, 2004 at 14:53:11, Robert Hyatt wrote:
>
>>I thought I would check on the NPS and SMP scaling, just to see how the opteron
>>was doing compared to my other Intel boxes.  Recall that my quad xeon scaled NPS
>>almost perfectly, but that originally my dual xeon 2.8 did much worse.  However,
>>also recall that recent changes have improved the xeon scaling as well.  Here is
>>how the Opteron does, which shows it is not having any memory/cache
>>bottleneckes.  This is _one_ of the cray blitz positions from the DTS paper.  I
>>have run them all, but didn't want to post that much raw data here although I
>>can email you 1 run with 1 cpu, four with 2 cpus and four with 4 cpus if you
>>want something to look at.
>>
>>Here's the summary, you can look at the raw data below (this is position 5,
>>chosen because I ran each test for 5 minutes, and this position was the first to
>>reach depth=13 with 1 processor.
>>
>>                 1cpu            2cpus            4cpus
>>nps             2.17M            4.35M            8.41M
>>scale             1.0              2.0              3.9
>>
>>I did not look at the actual parallel speedup, although the data for this
>>one position is given below.  Note that the speedup time has significant
>>variance, while the NPS is the better comparison to see how the hardware is
>>actually doing.  IE 3.9 above means that 4 processors are not getting in each
>>other's way much at all. Note that the 2.0 is as good as it can get, of course.
>>  Which simply shows that the opteron is doing well in a parallel sense.
>>
>>Note that for those in the know, this machine is running in NUMA mode, not SMP
>>mode (NUMA sets two contiguous gigabytes of RAM per processor, SMP mode
>>interleaves 4k pages so that page 0 is on cpu 0, page 1 on cpu1, etc...  that is
>>probably better for non-NUMA-aware programs...  NUMA mode is better if you take
>>care to get things into local memory better...
>>
>>More later.
>>
>>
>>one processor:
>>
>>solution 1. bxc6
>>              time surplus   0.00  time limit 5:00 (5:00)
>>              depth   time  score   variation (1)
>>                6     0.10     ++   1. ... bxc6!!
>>                6     0.14  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                6->   0.15  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                7     0.28     ++   1. ... bxc6!!
>>                7->   0.34  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                8     0.80  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>>                                    7. Ba6 Qxd4+
>>                8->   0.91  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>>                                    7. Ba6 Qxd4+
>>                9     2.89  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>                9->   3.19  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               10     7.40  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               10->   8.18  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               11    25.29  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>>               11->  28.17  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>>               12     1:00  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>>               12->   2:13  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>>               13     4:02  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>>               13->   4:52  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>>              time=5:00  cpu=99%  mat=-1  n=650939746  fh=94%  nps=2.17M
>>              ext-> chk=42151085 cap=1623638 pp=804158 1rep=1876597 mate=63368
>>              predicted=0  nodes=650939746  evals=128158996
>>              endgame tablebase-> probes=0  hits=0
>>              SMP->  split=0  stop=0  data=0/128  cpu=4:59  elap=5:00
>>----------------------> solution correct (5/5).
>>
>>
>>two processors:
>>solution 1. bxc6
>>              time surplus   0.00  time limit 5:00 (5:00)
>>              depth   time  score   variation (1)
>>                6     0.06     ++   1. ... bxc6!!
>>                6     0.08  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                6->   0.08  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                7     0.16     ++   1. ... bxc6!!
>>                7->   0.19  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4 (s=3)
>>                8     0.45  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>>                                    7. Ba6 Qxd4+ (s=2)
>>                8->   0.51  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>>                                    7. Ba6 Qxd4+
>>                9     1.43  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>                9->   1.59  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               10     3.60  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               10->   3.99  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               11    11.35  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>>               11->  21.85  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6 (s=2)
>>               12    39.37  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>>               12->  45.05  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>>               13     1:40  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>>               13->   1:58  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>>              time=5:00  cpu=199%  mat=-1  n=1306020483  fh=94%  nps=4.35M
>>              ext-> chk=89727132 cap=3344485 pp=1438540 1rep=4309007 mate=152018
>>              predicted=0  nodes=1306020483  evals=251560034
>>              endgame tablebase-> probes=0  hits=0
>>              SMP->  split=896  stop=116  data=6/128  cpu=9:59  elap=5:00
>>
>>
>>four cpus:
>>
>>solution 1. bxc6
>>              time surplus   0.00  time limit 5:00 (5:00)
>>              depth   time  score   variation (1)
>>                5->   0.03   0.30   1. ... bxc6 2. Ne4 Nxe4 3. Qxg4+ Kc7
>>                                    4. Qf4+ Bd6 5. Qxf7+
>>                6     0.04     ++   1. ... bxc6!!
>>                6     0.06  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                6->   0.06  -0.12   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4
>>                7     0.10     ++   1. ... bxc6!!
>>                7->   0.13  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
>>                                    7. Bxe4 (s=3)
>>                8     0.28  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>>                                    7. Ba6 Qxd4+ (s=2)
>>                8->   0.34  -0.55   1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
>>                                    4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
>>                                    7. Ba6 Qxd4+
>>                9     0.89  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>                9->   1.02  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               10     2.16  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               10->   2.45  -0.71   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
>>                                    7. Rxe4 cxb5 8. Rxg4
>>               11     6.22  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>>               11->   7.08  -0.63   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Bd6
>>               12    15.70  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>>               12->  38.28  -0.51   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
>>                                    (s=2)
>>               13     1:07  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>>               13->   1:28  -0.42   1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
>>                                    4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
>>                                    7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
>>                                    (s=2)
>>               14     4:20  -0.41   1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
>>                                    4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
>>                                    7. Qxe4 Qxe4 8. Rxe4 Rc8 9. Bd2 Nc4
>>                                    10. Bc3
>>              time=5:00  cpu=398%  mat=-1  n=2483899624  fh=93%  nps=8.28M
>>              ext-> chk=163866744 cap=6703544 pp=3608824 1rep=7938829
>>mate=298937
>>              predicted=0  nodes=2483899624  evals=565584863
>>              endgame tablebase-> probes=0  hits=0
>>              SMP->  split=34290  stop=5494  data=16/128  cpu=19:55  elap=5:00



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.