Author: Heinz van Kempen
Date: 04:01:49 01/26/06
Go up one level in this thread
On January 26, 2006 at 06:27:40, Rolf Tueschen wrote: >On January 26, 2006 at 05:47:50, Heinz van Kempen wrote: > >>On January 25, 2006 at 17:53:50, Joseph Ciarrochi wrote: >> >>>I was just thinking. It might be useful to have a table of values that allow you >>>to determine that one engine is significantly better than another. Is someone >>>interested in puting this on their chess webpage? I can create the table for >>>you. >>> >>>e.g. it would be something like this: >>> >>>Null hypothese is: the engines do not differ. or , the win percentage is 50% >>> >>>We need to reject this hypothesis to conclude engines are different. Our >>>criteria for rejecting could vary in strictness from 5% to .1 % . (e.g., if the >>>result would only occur .1% of the time assuming 50% true value, we reject the >>>null and conclude that the engines do differ) >>> >>>The table could look something like this >>> >>> >>>number of games 5% cutoff 1% cuttof .1 % cuttof >>> >>>20 number number number >> >>Hi Joseph, >> >>very interesting observations and calculations. The CEGT group would be very >>interested in adding your tables(s) to the website. > >Perhaps a little academy would also be favorable in a certain sense. > > >> >>For me I was always asking myself how probable it is to get this weird results >>we are having from time to time and how this probabilities could be narrowed >>down. > >The answer is clear. By sound application of statistics. > > > >>We already experimented with giving same openings to White and Black in >>consecutive games. It would be also possible to reduce errors by playing exactly >>the same amount of games against all other engines. On the other hand we do not >>want to be slaves of stats and machinery (although we all find it interesting >>and could play with numbers for hours) and also have fun and suspense with >>tournaments. This tournaments then on the other hand by the more random played >>games over the time of course create more inexact values, but I think with >>thousands of games for one engines this does hardly count anymore. > > >Yes - with a single caveat. You cant just continue like SSDF and then claim that >now you already had 40000 games etc. If you make tests with always new entries >and with "only" a couple of games per entry... This is NOT what stats >understands under big numbers and their effects. In other words you cant claim >validity with thousands of invalid results. Yes, you get numbers and results in >the end, but what do they mean? I think we all overestimate the value of our >little matches. You for example already have a differentiated presentation of >rankings. For Blitz and slower games. But other aspects seem to be rather >neglected. Only this way you keep your wellness otherwise you must have admitted >that you still cant say anything at all... at least not in this sort of lists. >Now you could answer that others do it different so that overall all aspects >come to their right. But I dont believe in such automatism. > > > >> >>Best Regards >>Heinz Hi Rolf, we all have our beliefs and scepticisms. One thing is for sure and here all serious testers I discussed with agreed: the perfect rating list does not exist and is not possible to achieve. The important thing is that their are efforts to offer something authors, customers, fans can draw more or less founded conclusions from. This conclusions have to be drawn by everyone for himself and for his needs. The best picture I would say can be obtained when you compare different attempts of the testers who are spending a lot of time on their hobby. SSDF for itself or CEGT or CSS list or Kurt´s or Sedat´s tournaments (just to mention a few) do not give the whole truth (if there is any). You have to compare to get a more complete picture. CEGT does not claim to be better or worse than other efforts done. We are just a group of testers doing something in common and wanting to have fun with our hobby and when people like to compare there is just another list we can offer regularly and keep it up-to-date. Best Regards Heinz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.