Author: Robert Hyatt
Date: 11:53:11 01/30/04
I thought I would check on the NPS and SMP scaling, just to see how the opteron
was doing compared to my other Intel boxes. Recall that my quad xeon scaled NPS
almost perfectly, but that originally my dual xeon 2.8 did much worse. However,
also recall that recent changes have improved the xeon scaling as well. Here is
how the Opteron does, which shows it is not having any memory/cache
bottleneckes. This is _one_ of the cray blitz positions from the DTS paper. I
have run them all, but didn't want to post that much raw data here although I
can email you 1 run with 1 cpu, four with 2 cpus and four with 4 cpus if you
want something to look at.
Here's the summary, you can look at the raw data below (this is position 5,
chosen because I ran each test for 5 minutes, and this position was the first to
reach depth=13 with 1 processor.
1cpu 2cpus 4cpus
nps 2.17M 4.35M 8.41M
scale 1.0 2.0 3.9
I did not look at the actual parallel speedup, although the data for this
one position is given below. Note that the speedup time has significant
variance, while the NPS is the better comparison to see how the hardware is
actually doing. IE 3.9 above means that 4 processors are not getting in each
other's way much at all. Note that the 2.0 is as good as it can get, of course.
Which simply shows that the opteron is doing well in a parallel sense.
Note that for those in the know, this machine is running in NUMA mode, not SMP
mode (NUMA sets two contiguous gigabytes of RAM per processor, SMP mode
interleaves 4k pages so that page 0 is on cpu 0, page 1 on cpu1, etc... that is
probably better for non-NUMA-aware programs... NUMA mode is better if you take
care to get things into local memory better...
More later.
one processor:
solution 1. bxc6
time surplus 0.00 time limit 5:00 (5:00)
depth time score variation (1)
6 0.10 ++ 1. ... bxc6!!
6 0.14 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
6-> 0.15 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
7 0.28 ++ 1. ... bxc6!!
7-> 0.34 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
8 0.80 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
7. Ba6 Qxd4+
8-> 0.91 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
7. Ba6 Qxd4+
9 2.89 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
9-> 3.19 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
10 7.40 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
10-> 8.18 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
11 25.29 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Bd6
11-> 28.17 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Bd6
12 1:00 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
12-> 2:13 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
13 4:02 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
13-> 4:52 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
time=5:00 cpu=99% mat=-1 n=650939746 fh=94% nps=2.17M
ext-> chk=42151085 cap=1623638 pp=804158 1rep=1876597 mate=63368
predicted=0 nodes=650939746 evals=128158996
endgame tablebase-> probes=0 hits=0
SMP-> split=0 stop=0 data=0/128 cpu=4:59 elap=5:00
----------------------> solution correct (5/5).
two processors:
solution 1. bxc6
time surplus 0.00 time limit 5:00 (5:00)
depth time score variation (1)
6 0.06 ++ 1. ... bxc6!!
6 0.08 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
6-> 0.08 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
7 0.16 ++ 1. ... bxc6!!
7-> 0.19 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4 (s=3)
8 0.45 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
7. Ba6 Qxd4+ (s=2)
8-> 0.51 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
7. Ba6 Qxd4+
9 1.43 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
9-> 1.59 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
10 3.60 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
10-> 3.99 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
11 11.35 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Bd6
11-> 21.85 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Bd6 (s=2)
12 39.37 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
12-> 45.05 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
13 1:40 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
13-> 1:58 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
time=5:00 cpu=199% mat=-1 n=1306020483 fh=94% nps=4.35M
ext-> chk=89727132 cap=3344485 pp=1438540 1rep=4309007 mate=152018
predicted=0 nodes=1306020483 evals=251560034
endgame tablebase-> probes=0 hits=0
SMP-> split=896 stop=116 data=6/128 cpu=9:59 elap=5:00
four cpus:
solution 1. bxc6
time surplus 0.00 time limit 5:00 (5:00)
depth time score variation (1)
5-> 0.03 0.30 1. ... bxc6 2. Ne4 Nxe4 3. Qxg4+ Kc7
4. Qf4+ Bd6 5. Qxf7+
6 0.04 ++ 1. ... bxc6!!
6 0.06 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
6-> 0.06 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4
7 0.10 ++ 1. ... bxc6!!
7-> 0.13 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4
7. Bxe4 (s=3)
8 0.28 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
7. Ba6 Qxd4+ (s=2)
8-> 0.34 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+
4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6
7. Ba6 Qxd4+
9 0.89 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
9-> 1.02 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
10 2.16 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
10-> 2.45 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4
7. Rxe4 cxb5 8. Rxg4
11 6.22 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Bd6
11-> 7.08 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Bd6
12 15.70 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
12-> 38.28 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8
(s=2)
13 1:07 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
13-> 1:28 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6
4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7
7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8
(s=2)
14 4:20 -0.41 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4
4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6
7. Qxe4 Qxe4 8. Rxe4 Rc8 9. Bd2 Nc4
10. Bc3
time=5:00 cpu=398% mat=-1 n=2483899624 fh=93% nps=8.28M
ext-> chk=163866744 cap=6703544 pp=3608824 1rep=7938829
mate=298937
predicted=0 nodes=2483899624 evals=565584863
endgame tablebase-> probes=0 hits=0
SMP-> split=34290 stop=5494 data=16/128 cpu=19:55 elap=5:00
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.