Author: Dan Homan
Date: 08:36:56 08/14/98
Go up one level in this thread
On August 14, 1998 at 10:42:04, Bruce Moreland wrote: > >On August 14, 1998 at 06:08:36, Dan Homan wrote: > >>On August 12, 1998 at 09:15:19, Bruce Moreland wrote: >> >>> >>>On August 11, 1998 at 06:52:02, Tony Hedlund wrote: >>> >>>> >>>>>So I think 4-0 actually turns out to be a significant result. If you score 4-0, >>>>>you can say that there is a very good chance that the one with the wins is >>>>>better than the ones with the losses. >>>>> >>>>>You can't say this if you pick out a string of 4 wins in a row in the midst of a >>>>>longer match, since you might be selecting a fluke case, but if you just start >>>>>from scratch, and get 4-0, you should be able to stop. In fact I think you >>>>>might be able to stop if you get 3.5 - 0.5, but I am less certain of this case. >>>>>Someone who has more statistics than I may be willing to comment on this. >>>> >>>>Recently I played the match Shredder2 P200 MMX 64MB - Rebel8 P90 16MB. >>>>Rebel won the first four games but Shredder won the match with 11-9. >>> >>>That shouldn't happen very often. >>> >>>bruce > >If this does happen more often than it should, perhaps some effort could be >expended to figure out why. I don't know why it would happen more often than it >should. > >>The problem with small number statistics is that they can be very >>mis-leading. A 4-0 result in a 4 game match between nearly equal >>programs (with 20% draw chances) happens about 1/40 th of the time. >>A 3.5-0.5 (or better) result happens about 1/13 th of a time. > >If you want to be able to say, "program A is stronger than program B, by at >least a little bit", would you rather have a 4-0 result or a 105-95 result? > >You should get a bogus result from 4-0 only 2.5% of the time by your >calculation, Actually I think my calculation was wrong. I didn't consider the chance of the weaker program going 4-0.... see below. >how much do you want to bet that you'd get a bogus result from >105-95 a lot more of the time? Of course, but my point is that if you think you are learning anything useful from 4-0 result in a 4 game match between top programs you are wrong. I initially thought this was a great idea until I thought about it - the more I think about it the worse it gets. Take the example I gave in my last post (I snipped it here). Using 4 game matches to decide between versions of a single program with small differences will lead to an essentially random evolution of the program! The problem is to measure small differences in strength. Assume a marginally stronger program has a 42% chance of winning a game while its marginally weaker opponent has a 38% chance of winning a game. (implicit is a 20% draw chance) The probability of the stronger program going 4-0 is 1/32, but the probability of the weaker program going 4-0 is 1/48. So if you get a 4-0 result from the match, you are only 60% sure that the stronger program won! Lets bump it up to 45%-35% chances of winning for the stronger/weaker program. If the match gives a 4-0 result you are still only 74% sure that the stronger program won! - Dan
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.