Author: Robert Hyatt
Date: 13:02:56 01/04/04
Go up one level in this thread
On January 04, 2004 at 13:30:34, Roger Brown wrote: >>I think you need to do something in 60 minutes at least, plus some sort of >>secondary time control or increment. > > >Hello Dr. Robert Hyatt, > >That means one hour (plus an increment) per engine, right? yes.. something in that range. > >This is of course quite distressing. That timecontrol would yield a game in two >hours, twelve games a day, eighty-four games in a week! My computer will begin >to cry, unborn children will start fidgeting, comets will fall.... > >Tell me something, with all the results of the hundreds (thousands (?)) of games >that your engine has played over the years, is it possible that you could >extract a rating based on the short timecontrol games (unless 60 minutes is >short - which it is for human games - in which event the experiment is not >feasible) against the long timecontrol games? The problem is that Crafty is not stable. New releases come out sometimes twice a week, so comparing old and new games would really be difficult to interpret. Commercial programs are easier since they come out once or twice per year, and there is a longer time to play a significant number of games with no changes whatsoever to the program. > >I could then take the upper bound of the short timecontrol games as a useful >starting point for my test. > > >Two hour games are not going to work on my machine... All I can suggest, then, is to go shorter. IE 10 minutes + 10 seconds per move increment might be a reasonable start, since that will at least avoid the <1 second moves near the end of a sudden-death time control. > > > >>If you look however, you will see an IM win the blitz events on ICC or >>at other places, because blitz is simply a different game. > > >Suggesting that a possible way forward is to construct a blitz ratings list and >a separate longer timecontrol list. Now that is going to create all sorts of >confusion! > > >>This depends on the strength of the two players. The wider the gap, the >>fewer games you need to play. An easy example is to pick two players on ICC >>and search for all games between them. Pick one player's perspective and >>record a win as 1, a dra as .5 and a loss as 0. After you do a few hundred >>such games, look at the string of results. Do you see a consecutive >>group you could pick that shows A to be stronger? Another group that would >>show B to be stronger? That is what is wrong with a small sample-size. You >>might just start off at the front of either of those two groups, and if you >>stop too soon, you get a biased result. > > > >Sorry to be such a bother but does the summary below make sense? > >(a) Player x is much stronger than player Y - established by a historical >examination of the engine's performances on some rating list. > >(b) Play 100 games at 5 minutes. Note the score. > >(c) Play some reasonable number of games that will not crash my PC or cause the >other users to rebel at a much higher timecontrol between A and B. > >(d) Compare (b) and (c) and see if they map well, that is, are the results of >this test a useful example of the predictive power of 5 minute (or some short, >short timecontrol)games? Does (c) map to (b) and do they map to the results of >(a)? No. The problem again becomes the number of games. To see what I mean, play a 200 game match, one minute per game. Look at the 0-.5-1 results for a single program. If you look at the string carefully, you will likely see somewhere where there is a string of 15+ losses to 5 wins, and then elsewhere you will find the opposite. If you play 20 games total, how do you know you didn't get one of those odd sample sets??? > > >Does this sound reasonable? > > >Of course, between new engines then it is not possible to conduct step (a) is >it? So I guess I will start with Crafty... > > >>Between two programs, it can be very significant. But you can answer this >>with experimentation, A vs B with learning on, then A with learning on vs >>B with learning off. > > >Back to the original problem...how many games, what timecontrol etc. Should I >be able to do this experiemnt then it should answer the learning vs. not >learning issue - as well as any issue to do with what is the minimum work which >can be done in order to say something meaningful about the result of a match A >vs B. > > >:-) If I were testing this, I would do at least 200 games with learning and 200 without, and even then the margin of error could be very high if you pick a program that is very close to Crafty's playing strength. If you pick one much worse or much better, then fewer games will do fine. > > >Thanks for your time. > > >Later.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.