Author: John Sidles
Date: 14:28:15 02/18/06
Go up one level in this thread
Trickier than I thought! The previous table could have errors of up to one-half point. Here's the final (I hope) table, extending down to very short tournaments. For programmers, the interesting thing is tha a tournament as short as four games can give 95% confidence that a "tweak" has helped (or hurt) the program, *iff* the modified program sweeps (or loses) all four games. The shortest tournament that yields 99.9% confidence is a seven-game "sweep" (or loss) The shortest tournament for which a draw is still consonant with 99.9% confidence is a 9.5/10 score. Here's the extended table: nGame = 4, confidence = 95.%, score = 4./4 nGame = 5, confidence = 95.%, score = 4.5/5 nGame = 5, confidence = 99.%, score = 5./5 nGame = 6, confidence = 95.%, score = 5.5/6 nGame = 6, confidence = 99.%, score = 6./6 nGame = 7, confidence = 95.%, score = 6./7 nGame = 7, confidence = 99.%, score = 6.5/7 nGame = 7, confidence = 99.9%, score = 7./7 nGame = 8, confidence = 95.%, score = 6.5/8 nGame = 8, confidence = 99.%, score = 7.5/8 nGame = 8, confidence = 99.9%, score = 8./8 nGame = 9, confidence = 95.%, score = 7.5/9 nGame = 9, confidence = 99.%, score = 8./9 nGame = 9, confidence = 99.9%, score = 9./9 nGame = 10, confidence = 95.%, score = 8./10 nGame = 10, confidence = 99.%, score = 8.5/10 nGame = 10, confidence = 99.9%, score = 9.5/10 nGame = 15, confidence = 95.%, score = 11./15 nGame = 15, confidence = 99.%, score = 12./15 nGame = 15, confidence = 99.9%, score = 13./15 nGame = 20, confidence = 95.%, score = 14./20 nGame = 20, confidence = 99.%, score = 15./20 nGame = 20, confidence = 99.9%, score = 16.5/20 nGame = 30, confidence = 95.%, score = 20./30 nGame = 30, confidence = 99.%, score = 21./30 nGame = 30, confidence = 99.9%, score = 22.5/30 nGame = 40, confidence = 95.%, score = 25.5/40 nGame = 40, confidence = 99.%, score = 27./40 nGame = 40, confidence = 99.9%, score = 29./40 nGame = 50, confidence = 95.%, score = 31./50 nGame = 50, confidence = 99.%, score = 33./50 nGame = 50, confidence = 99.9%, score = 35./50 nGame = 75, confidence = 95.%, score = 45./75 nGame = 75, confidence = 99.%, score = 47./75 nGame = 75, confidence = 99.9%, score = 49.5/75 nGame = 100, confidence = 95.%, score = 58.5/100 nGame = 100, confidence = 99.%, score = 61./100 nGame = 100, confidence = 99.9%, score = 64./100 On February 18, 2006 at 17:07:12, John Sidles wrote: >On February 18, 2006 at 16:52:34, John Sidles wrote: > >Here's the same table, with total score instead of the >(less well-defined) "plus score" > >nGame = 10, confidence = 95.%, score = 7.5/10 >nGame = 10, confidence = 99.%, score = 8./10 >nGame = 10, confidence = 99.9%, score = 9./10 > >nGame = 20, confidence = 95.%, score = 13.5/20 >nGame = 20, confidence = 99.%, score = 14.5/20 >nGame = 20, confidence = 99.9%, score = 16./20 > >nGame = 30, confidence = 95.%, score = 19.5/30 >nGame = 30, confidence = 99.%, score = 20.5/30 >nGame = 30, confidence = 99.9%, score = 22./30 > >nGame = 40, confidence = 95.%, score = 25./40 >nGame = 40, confidence = 99.%, score = 26.5/40 >nGame = 40, confidence = 99.9%, score = 28.5/40 > >nGame = 50, confidence = 95.%, score = 30.5/50 >nGame = 50, confidence = 99.%, score = 32.5/50 >nGame = 50, confidence = 99.9%, score = 34.5/50 > >nGame = 75, confidence = 95.%, score = 44.5/75 >nGame = 75, confidence = 99.%, score = 46.5/75 >nGame = 75, confidence = 99.9%, score = 49./75 > >nGame = 100, confidence = 95.%, score = 58./100 >nGame = 100, confidence = 99.%, score = 60.5/100 >nGame = 100, confidence = 99.9%, score = 63.5/100 > > > >>On February 18, 2006 at 03:50:09, Uri Blass wrote: >> >>>My question is based on your experience what is the biggest result that A beat B >>>in match of 100 games(Noomen match or match based on other positions like Albert >>>Silver's postions) but still A is not better than B against other programs. >> >>Here's a table for how large a plus score you need to see, by either A or B, for >>you >>to be confident (at the given level of confident) this plus score was not due to >>luck. >> >>Here "luck" means that A and B actually each have 1/3 chance of win, lose and >>draw, >>but that either program was simply lucky enough to achieve a plus score. >> >>nGame = 10, confidence = 95.%, score = +2.5 >>nGame = 10, confidence = 99.%, score = +3. >>nGame = 10, confidence = 99.9%, score = +4. >>nGame = 20, confidence = 95.%, score = +3.5 >>nGame = 20, confidence = 99.%, score = +4.5 >>nGame = 20, confidence = 99.9%, score = +6. >>nGame = 30, confidence = 95.%, score = +4.5 >>nGame = 30, confidence = 99.%, score = +5.5 >>nGame = 30, confidence = 99.9%, score = +7. >>nGame = 40, confidence = 95.%, score = +5. >>nGame = 40, confidence = 99.%, score = +6.5 >>nGame = 40, confidence = 99.9%, score = +8.5 >>nGame = 50, confidence = 95.%, score = +5.5 >>nGame = 50, confidence = 99.%, score = +7.5 >>nGame = 50, confidence = 99.9%, score = +9.5 >>nGame = 75, confidence = 95.%, score = +7. >>nGame = 75, confidence = 99.%, score = +9. >>nGame = 75, confidence = 99.9%, score = +11.5 >>nGame = 100, confidence = 95.%, score = +8. >>nGame = 100, confidence = 99.%, score = +10.5 >>nGame = 100, confidence = 99.9%, score = +13.5 >> >>So for example, in a 10 game tournament, if either >>program achieves a +4 score (or higher), you can be >>99.9% confident that such a high score was *not* >>due to luck. >> >>Note: these were calculated by brute force in >>Mathematica.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.