Author: Rolf Tueschen
Date: 03:27:40 01/26/06
Go up one level in this thread
On January 26, 2006 at 05:47:50, Heinz van Kempen wrote: >On January 25, 2006 at 17:53:50, Joseph Ciarrochi wrote: > >>I was just thinking. It might be useful to have a table of values that allow you >>to determine that one engine is significantly better than another. Is someone >>interested in puting this on their chess webpage? I can create the table for >>you. >> >>e.g. it would be something like this: >> >>Null hypothese is: the engines do not differ. or , the win percentage is 50% >> >>We need to reject this hypothesis to conclude engines are different. Our >>criteria for rejecting could vary in strictness from 5% to .1 % . (e.g., if the >>result would only occur .1% of the time assuming 50% true value, we reject the >>null and conclude that the engines do differ) >> >>The table could look something like this >> >> >>number of games 5% cutoff 1% cuttof .1 % cuttof >> >>20 number number number > >Hi Joseph, > >very interesting observations and calculations. The CEGT group would be very >interested in adding your tables(s) to the website. Perhaps a little academy would also be favorable in a certain sense. > >For me I was always asking myself how probable it is to get this weird results >we are having from time to time and how this probabilities could be narrowed >down. The answer is clear. By sound application of statistics. >We already experimented with giving same openings to White and Black in >consecutive games. It would be also possible to reduce errors by playing exactly >the same amount of games against all other engines. On the other hand we do not >want to be slaves of stats and machinery (although we all find it interesting >and could play with numbers for hours) and also have fun and suspense with >tournaments. This tournaments then on the other hand by the more random played >games over the time of course create more inexact values, but I think with >thousands of games for one engines this does hardly count anymore. Yes - with a single caveat. You cant just continue like SSDF and then claim that now you already had 40000 games etc. If you make tests with always new entries and with "only" a couple of games per entry... This is NOT what stats understands under big numbers and their effects. In other words you cant claim validity with thousands of invalid results. Yes, you get numbers and results in the end, but what do they mean? I think we all overestimate the value of our little matches. You for example already have a differentiated presentation of rankings. For Blitz and slower games. But other aspects seem to be rather neglected. Only this way you keep your wellness otherwise you must have admitted that you still cant say anything at all... at least not in this sort of lists. Now you could answer that others do it different so that overall all aspects come to their right. But I dont believe in such automatism. > >Best Regards >Heinz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.