Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: an idea for an experiment about rating

Author: Uri Blass
Date: 01:06:05 04/08/05
On April 08, 2005 at 01:45:47, Stephen A. Boak wrote:

>On April 07, 2005 at 11:29:03, Daniel Pineo wrote:
>
>>On April 06, 2005 at 19:58:17, Uri Blass wrote:
>>
>>>take a chess program.
>>>
>>>Your target is to find the difference in  rating between the program and a
>>>program that plays random moves.
>>
>>That's actually a good way to define an elo system that isn't strictly relative
>>like the one we have now.
>
>Good way to define an elo system??  Could there be any _worse_ way? !!
>
>[I guess there could.  The baseline program may be one that intentionally makes
>as many losing moves as possible (plays 'give away' or 'Loser's' chess variant;
>enables Helpmates wherever possible).]
>
>Time to study some important books (Elo description, and general theory of
>statistics).
>
>Good luck to you and Uri!  And a whole lot of time!  I think both of you will
>need the 'help'.
>
>Here are my 'random' thoughts  :)
>Please excuse any confusion they may bring.
>
>The suggestion has several _BIG_ problems.
>
>1. All Elo systems are 'relative'.  This one attempts to measure strength
>(relatively!) against a random move 'program'.
>
>IMO, it will largely be unsuccessful in its goal.
>
>Any realistic output will occur in the most trivial of situations--where the
>measured programs produce random moves a large part of the time (but not all the
>time).  Thus the results will provide no meaningful measure for typically
>stronger (always non-random moving) beginning programs.

It is clearly possible that you are right but it may be interesting to find out.

>
>2. The real power of an Elo system to measure relative strengths of programs is
>predicated on the program to be rated being tested (competed) against many other
>already rated members of a player pool.
>
>Measuring each program individually against a single baseline player is
>ridiculous.
>
>It may produce a numerical score (hence rating figure) for each program competed
>against the baseline random move generator, but it will in no way accurately
>rank the programs to predict how they will perform against each other (or
>against a human).
>
>Why?  Because each rated program has not been rated against multiple rated
>players in the same pool, but only against a static, single opponent (the random
>move generator), which is a different (and trivially small) pool of its own.

This is not exactly what I suggested.

The random move generator will play against a program that is random move
generator only in 99% of the cases.

The original program will play against the original program except playing
random moves in 1% of the cases and there are going to be 100 matches.

1)0% random against 1% random,
2)1% random  against 2% random
...
100)99% random against 100% random

Note that a program that play random moves in 1% of the cases may be lucky not
to play random game in all the game that is drawn in 30 moves or be lucky to
play random move only in situation that it does not change the result.

>
>3. As with all statistically based measurement systems, the measured rating is
>most accurate when the player plays a range of opponents within, say, one or two
>standard deviations of his own rating.
>
>When a non random-moving program (say, a bean-counter program only) plays a
>random-moving program, I would guess that the random moving program would
>*never* win a game ... because its strength is so far, far beneath that of
>virtually all non-random programs created with the goal of winning chess games.
>I could be wrong ... but ....

You are wrong in theory for a different reason than the reason that you suggest
because the random program may be lucky and play perfect in something like
1/10^100 of the times but I agree that practically you will never see it win or
draw a game against non random program with no bugs.

I also explained that my idea is not to play one match between random and not
random but many matches when the difference in randomness is small.

<snipped>
>5. Let A & B each play R (a random-move program) 100,000 games.  Say A scores
>99,999 to 1.  Say B scores 99,998 to 2.
>
>What ratings might the suggested process produce for A & B.

Not enough data to know but this is not the experiment that I suggested.

<snipped>
>10. Let me know when you have enough games played.  I'd be happy to compute the
>Elo ratings for you.  What you do with the figures after that is beyond me.  I
>would have no use for such 'ratings'.  :)

Unfortunately I do not plan to do the experiment by myself(I have not enough
computer time for it) but it may be interesting if someone else is interested in
doing it.

Regards,
Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.