Author: Roger Brown

Date: 10:30:34 01/04/04

```>I think you need to do something in 60 minutes at least, plus some sort of
>secondary time control or increment.

Hello Dr. Robert Hyatt,

That means one hour (plus an increment) per engine, right?

This is of course quite distressing.  That timecontrol would yield a game in two
hours, twelve games a day, eighty-four games in a week!  My computer will begin
to cry, unborn children will start fidgeting, comets will fall....

Tell me something, with all the results of the hundreds (thousands (?)) of games
that your engine has played over the years, is it possible that you could
extract a rating based on the short timecontrol games (unless 60 minutes is
short - which it is for human games - in which event the experiment is not
feasible) against the long timecontrol games?

I could then take the upper bound of the short timecontrol games as a useful
starting point for my test.

Two hour games are not going to work on my machine...

>If you look however, you will see an IM win the blitz events on ICC or
>at other places, because blitz is simply a different game.

Suggesting that a possible way forward is to construct a blitz ratings list and
a separate longer timecontrol list.  Now that is going to create all sorts of
confusion!

>This depends on the strength of the two players.  The wider the gap, the
>fewer games you need to play.  An easy example is to pick two players on ICC
>and search for all games between them.  Pick one player's perspective and
>record a win as 1, a dra as .5 and a loss as 0.  After you do a few hundred
>such games, look at the string of results.  Do you see a consecutive
>group you could pick that shows A to be stronger?  Another group that would
>show B to be stronger?  That is what is wrong with a small sample-size.  You
>might just start off at the front of either of those two groups, and if you
>stop too soon, you get a biased result.

Sorry to be such a bother but does the summary below make sense?

(a)  Player x is much stronger than player Y - established by a historical
examination of the engine's performances on some rating list.

(b)  Play 100 games at 5 minutes.  Note the score.

(c)  Play some reasonable number of games that will not crash my PC or cause the
other users to rebel at a much higher timecontrol between A and B.

(d)  Compare (b) and (c) and see if they map well, that is, are the results of
this test a useful example of the predictive power of 5 minute (or some short,
short timecontrol)games?  Does (c) map to (b) and do they map to the results of
(a)?

Does this sound reasonable?

Of course, between new engines then it is not possible to conduct step (a) is

>Between two programs, it can be very significant.  But you can answer this
>with experimentation, A vs B with learning on, then A with learning on vs
>B with learning off.

Back to the original problem...how many games, what timecontrol etc.  Should I
be able to do this experiemnt then it should answer the learning vs. not
learning issue - as well as any issue to do with what is the minimum work which
can be done in order to say something meaningful about the result of a match A
vs B.

:-)