Author: Dann Corbit
Date: 16:25:28 03/26/04
Go up one level in this thread
On March 26, 2004 at 17:41:07, Pat King wrote: >On March 24, 2004 at 17:31:35, Dann Corbit wrote: > >>On March 24, 2004 at 16:53:08, Uri Blass wrote: >>[snip] >>>The difference is more important and 10-0 is clearly more telling than 19-11 >> >>It is stronger, but less reliable. If it is true, then a dominant change may >>have been found. But the odds that it is true are not nearly so strong as the >>odds of 19-11 being true. > >Gotta go with Uri on this one. I calculated the following table assuming a 95% >confidense factor. N is the number of games, W the number of wins needed to >conclude one engine is better than the other. W is of course always rounded up. > >N W >5 5 (actually 4.3, so 4 out of 5 ain't bad!) >10 8 >20 14 >30 20 >50 31 >100 59 > >So I'm more confident that Uri's 10-0 result is significant than I am about your >19-11 result. > >An interesting question is when do you give up? At 19-11 you're very close to >meeting the 95% threshold. Suppose you play 20 more games and find yourself at >30-20. Play 50 more and end up with 58-42. > >Another way to look at this is how confident do you need to be? After 100 games, >my above example surely meets the 90% threshold. Question for the group: If you >were to use a formal statistical method to evaluate your changes, what >confidence level would you want to use? 90%? 95? 99? > >I may throw together a quick web page about this, if there's any interest. I do not even find results interesting until I have 30 games. At 30 games, I can look for a trend. I use 200 games to confirm it. It is a mistake to trust a 10/0 result, and especially against a single opponent if you are trying to judge strength.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.