Author: Sandro Necchi
Date: 01:13:21 01/06/04
Go up one level in this thread
On January 04, 2004 at 19:22:43, Ricardo Gibert wrote: >On January 04, 2004 at 14:57:59, Mike Byrne wrote: > >>On January 04, 2004 at 13:46:48, Ricardo Gibert wrote: >> >>>On January 04, 2004 at 12:47:25, Peter Berger wrote: >>> >>>>On January 04, 2004 at 12:40:00, Ricardo Gibert wrote: >>>> >>>>>On January 04, 2004 at 12:29:15, Mark Young wrote: >>>>> >>>>>>On January 04, 2004 at 11:46:00, Roger Brown wrote: >>>>>> >>>>>>>Hello all, >>>>>>> >>>>>>>I have read numerous posts about the validity - or lack thereof actually - of >>>>>>>short matches between and among chess engines. The arguments of those who say >>>>>>>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert >>>>>>>Hyatt et al)typically indicate that well over 200 games are requires to make any >>>>>>>sort of statisticdal statement that engine X is better than engine Y. >>>>>>> >>>>>>>I concede this point. >>>>>> >>>>>>If you concede this point you don't understand. There is no magic number like >>>>>>200 or 2000. The score must be considered. Here is an example: >>>>>> >>>>>>A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner >>>>>>of the match is stronger then the loser. >>>>>> >>>>>>A 100 game match ending 55 - 45 only has a 81% chance that the winner of the >>>>>>match is the stronger program. >>>>>> >>>>>>A 200 game match ending 106 - 94 only has a 78 % chance that the winner is >>>>>>stronger then the loser. >>>>> >>>>> >>>>>Nothing you have said is really correct because you have ignored the significant >>>>>effect of draws in a match. >>>> >>>>The percentage of draws doesn't matter at all when it is about the conclusion >>>>which program is strongest based on the above match results. >>>> >>>>This has been shown by Remi Coloum and explained in multiple posts >>>>here(unfortunately the search engine hasn't found a new home yet). >>>> >>>>6-0 with 0 draws and 6-0 with 1000 draws has the exact same prediction value >>>>when it is about the question which engine is stronger based on a match result. >>> >>>In this case, the number of decisive games (w+L=6) and margin of victory (w-L=6) >>>is the same in both cases so the conclusion they have equal value is correct. >>> >>> ------------------------------- >>> >>>In the examples given before, the number of decisive games depends on the number >>>of draws e.g. +17-3=0 and +14-0=6 are not of equal value since the number >>>decisive games are not equal. >>> >>>Let's take a more obvious example. Let's say we play a 1000 game match and I win >>>by +20-0=980. I only score 51%, but if we then play a short match, your chances >>>of winning such a match is virtually zero, since the longer match has clearly >>>demonstrated you couldn't win a game if your life depended on it. >> >>But if you team needed a half point for you to win the Olympias, this is match >>up you wanted - a half point is a "shoo in" and you are the champs. Sometimes a >>draw is more important than a win and (in the example I used) is just as good as >>a win. >> >>Let's call the losing program "drawmaster" >> >> >> 98% of the games will end in draw - a coinflip that lands on the edge? >> >> >> >> >>> >>>Now compare this with the alternative possibility. We play a 1000 game match and >>>I win +510-490=0. Again 51%. Now we play a short match afterward, the match >>>outcome will be very nearly a virtual coin flip. >> >>Let's call this losing program "win_or_die" >> >>> >>>The first match is very convincing in demonstrating superiority. It is just as >>>effective as +20-0=0 is as per Remi. >> >>You may think so, but at the the end of the day, Dr Elo will have program >>"drawmaster" rated exactly the same as "win_or_die" --- and ratings are what we >>were talking about here. Which program you may want to use may be based on >>whether you need the win or a draw, if you need the draw , go with drawmaster, >>if you need the full point , your chances are better with "win_or_die" . > >Ratings are not what I was responding to. Among the many erroneous things Mike >Young said, "A 100 game match ending 55 - 45 only has a 81% chance that the >winner of the match is the stronger program." This is a very specific statement >dealing with whether a given player is better or not. Well, if referred to 2 chess programs playing each other, this figure may be optimistic/not true. A better figure based on not to many games (<300) would be better with at least 6 different opponents. There are cases where a program performs quite well against another one, but does not so well against other programs. The result could change the final figure more than you think. Sandro >Nothing to do with ratings >in that statement. He _cannot_ provide a figure like "81%" without consdiering >the percentage of games ending in draw. That's the type of mistake I directed >myself towards. > >> >> >>> >>>The second match is very unconvincing in demonstrating my superiority. It showed >>>a game between us is a virtual coin flip. >>> >>>Draws matter a lot, but you need to understand just how. I'm very familiar with >>>what Remi has said on this and it was quite correct. The trouble is people >>>misunderstand what he has said. >>> >>>If you have understood the above, you will then understand that my remark to >>>Mike Young was right on the money. >> >>I understand the above, but you are mixing apples and oranges and in the context >>of the discussion taking place, your post was not on the money. It's really a >>different subject (imo) and you just added unneeded confusion to a discussion. >> > >I'm baffled as to why you think I'm mixing apples and oranges. I think you need >to read through the thread again more carefully. If you do, you will find I >cleared away some misconceptions rather than "...added unneeded confusion..."
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.