Author: Mark Rawlings
Date: 12:14:40 01/30/04
Go up one level in this thread
I'm problably missing something, but how can using two processors search to a given depth _more_ than twice as fast as one processor? i.e. 4:52 to get through 13 ply for one processor vs. only 1:58 for two processors? (Maybe there is an element of luck as to what gets found in the hash?) Thanks, Mark On January 30, 2004 at 14:53:11, Robert Hyatt wrote: >I thought I would check on the NPS and SMP scaling, just to see how the opteron >was doing compared to my other Intel boxes. Recall that my quad xeon scaled NPS >almost perfectly, but that originally my dual xeon 2.8 did much worse. However, >also recall that recent changes have improved the xeon scaling as well. Here is >how the Opteron does, which shows it is not having any memory/cache >bottleneckes. This is _one_ of the cray blitz positions from the DTS paper. I >have run them all, but didn't want to post that much raw data here although I >can email you 1 run with 1 cpu, four with 2 cpus and four with 4 cpus if you >want something to look at. > >Here's the summary, you can look at the raw data below (this is position 5, >chosen because I ran each test for 5 minutes, and this position was the first to >reach depth=13 with 1 processor. > > 1cpu 2cpus 4cpus >nps 2.17M 4.35M 8.41M >scale 1.0 2.0 3.9 > >I did not look at the actual parallel speedup, although the data for this >one position is given below. Note that the speedup time has significant >variance, while the NPS is the better comparison to see how the hardware is >actually doing. IE 3.9 above means that 4 processors are not getting in each >other's way much at all. Note that the 2.0 is as good as it can get, of course. > Which simply shows that the opteron is doing well in a parallel sense. > >Note that for those in the know, this machine is running in NUMA mode, not SMP >mode (NUMA sets two contiguous gigabytes of RAM per processor, SMP mode >interleaves 4k pages so that page 0 is on cpu 0, page 1 on cpu1, etc... that is >probably better for non-NUMA-aware programs... NUMA mode is better if you take >care to get things into local memory better... > >More later. > > >one processor: > >solution 1. bxc6 > time surplus 0.00 time limit 5:00 (5:00) > depth time score variation (1) > 6 0.10 ++ 1. ... bxc6!! > 6 0.14 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 6-> 0.15 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 7 0.28 ++ 1. ... bxc6!! > 7-> 0.34 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 8 0.80 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6 > 7. Ba6 Qxd4+ > 8-> 0.91 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6 > 7. Ba6 Qxd4+ > 9 2.89 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 9-> 3.19 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 10 7.40 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 10-> 8.18 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 11 25.29 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Bd6 > 11-> 28.17 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Bd6 > 12 1:00 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8 > 12-> 2:13 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8 > 13 4:02 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6 > 4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7 > 7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8 > 13-> 4:52 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6 > 4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7 > 7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8 > time=5:00 cpu=99% mat=-1 n=650939746 fh=94% nps=2.17M > ext-> chk=42151085 cap=1623638 pp=804158 1rep=1876597 mate=63368 > predicted=0 nodes=650939746 evals=128158996 > endgame tablebase-> probes=0 hits=0 > SMP-> split=0 stop=0 data=0/128 cpu=4:59 elap=5:00 >----------------------> solution correct (5/5). > > >two processors: >solution 1. bxc6 > time surplus 0.00 time limit 5:00 (5:00) > depth time score variation (1) > 6 0.06 ++ 1. ... bxc6!! > 6 0.08 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 6-> 0.08 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 7 0.16 ++ 1. ... bxc6!! > 7-> 0.19 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 (s=3) > 8 0.45 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6 > 7. Ba6 Qxd4+ (s=2) > 8-> 0.51 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6 > 7. Ba6 Qxd4+ > 9 1.43 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 9-> 1.59 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 10 3.60 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 10-> 3.99 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 11 11.35 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Bd6 > 11-> 21.85 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Bd6 (s=2) > 12 39.37 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8 > 12-> 45.05 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8 > 13 1:40 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6 > 4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7 > 7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8 > 13-> 1:58 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6 > 4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7 > 7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8 > time=5:00 cpu=199% mat=-1 n=1306020483 fh=94% nps=4.35M > ext-> chk=89727132 cap=3344485 pp=1438540 1rep=4309007 mate=152018 > predicted=0 nodes=1306020483 evals=251560034 > endgame tablebase-> probes=0 hits=0 > SMP-> split=896 stop=116 data=6/128 cpu=9:59 elap=5:00 > > >four cpus: > >solution 1. bxc6 > time surplus 0.00 time limit 5:00 (5:00) > depth time score variation (1) > 5-> 0.03 0.30 1. ... bxc6 2. Ne4 Nxe4 3. Qxg4+ Kc7 > 4. Qf4+ Bd6 5. Qxf7+ > 6 0.04 ++ 1. ... bxc6!! > 6 0.06 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 6-> 0.06 -0.12 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 > 7 0.10 ++ 1. ... bxc6!! > 7-> 0.13 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Qf5 6. Bd3 Qxe4 > 7. Bxe4 (s=3) > 8 0.28 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6 > 7. Ba6 Qxd4+ (s=2) > 8-> 0.34 -0.55 1. ... bxc6 2. Ne4 f2+ 3. Kxf2 Nxe4+ > 4. Qxe4 Qf6+ 5. Kg1 Bf5 6. Qf3 Bd6 > 7. Ba6 Qxd4+ > 9 0.89 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 9-> 1.02 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 10 2.16 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 10-> 2.45 -0.71 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qd5+ Kc7 6. Qxe4 Qxe4 > 7. Rxe4 cxb5 8. Rxg4 > 11 6.22 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Bd6 > 11-> 7.08 -0.63 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Bd6 > 12 15.70 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8 > 12-> 38.28 -0.51 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Re8 9. Rxe8 Kxe8 > (s=2) > 13 1:07 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6 > 4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7 > 7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8 > 13-> 1:28 -0.42 1. ... bxc6 2. Ne4 f2+ 3. Nxf2 Bd6 > 4. Qh6 cxb5 5. Bg5 Be7 6. Rxe7+ Qxe7 > 7. Bxf6 Qe6 8. Qg5 Be2 9. Bxh8 Rxh8 > (s=2) > 14 4:20 -0.41 1. ... bxc6 2. Ne4 f2+ 3. Qxf2 Nxe4 > 4. Qxf7+ Qe7 5. Qf4 cxb5 6. Qxg4+ Qe6 > 7. Qxe4 Qxe4 8. Rxe4 Rc8 9. Bd2 Nc4 > 10. Bc3 > time=5:00 cpu=398% mat=-1 n=2483899624 fh=93% nps=8.28M > ext-> chk=163866744 cap=6703544 pp=3608824 1rep=7938829 >mate=298937 > predicted=0 nodes=2483899624 evals=565584863 > endgame tablebase-> probes=0 hits=0 > SMP-> split=34290 stop=5494 data=16/128 cpu=19:55 elap=5:00
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.