Author: Don Dailey
Date: 09:25:49 07/25/99
Go up one level in this thread
>People must understand that a internet rating can be very misleading. I know a
>rating has importents to you. But it is clear it is much more importent to you
>that your program play great chess to achieve its high rating. The same can be
>said about Bob. I know if you and Bob wished to, because you have the tools and
>knowledge to do it, you could spike your programs rating about as high as you
>want on it on ICC by using every dirty trick in the book.
The ratings on ICC are very crude and rough for a number of reasons and
even cruder for computers. Here are some of the reasons:
1. The K factor is quite high to make play exciting to humans.
2. The time controls for a given category have a great range. For
instance there is a world of difference between 3-0 and 5-10 but
they are both considered Blitz games. The difference is quite
pronounced to a computer playing a human.
3. You have complete choice about choosing your opponents and
time controls. You can refuse any match, noplay anyone you want.
Despite all of this, I think it's fairly hard to get 150 points in
either direction of your "correct" rating. (I'm sure that this post
will inspire a host of counterexamples and anecdotes.) But
my experience with Occam, a new and very primitive chess program I've
started to work on has produced a range of ratings from about the low
2200's to the low 2500's. That's a 300 point range and the lower end
hasn't been seen in a long time probably due to some improvements I've
made in the program. Almost all of the time, it is within 50 points of
about 2450 or so. Given the imperfections I stated earlier, that's
not too terribly bad.
The way to view the ratings on ICC is with a bit of a grain of salt.
I am trying to use them for comparison purposes only. I don't compare the
ratings to real world ratings even though they seem to roughly corelate.
The ratings seem a little inflated to me (even compared to the inflated
USCF ratings) but this is my own subjective judgement and I cannot back
it up. How do the Grandmasters do in this regard?
So I use the games for debugging, finding weakness and in a very rough
way to observe program improvements. The way I do this is to simply
keep all the games, and rate them myself with a program that parses
the PGN files and recomputes the ratings using a more stable (much
smaller) K factor. If you are playing a lot of games you can make
your own home-brewed rating about as stable as you wish using this
technique.
The other thing you can do is to limit the time controls you use to play
against humans with. You do this if you want to maximize the stability
of your computers rating. A lot of computers do this and try to maximize
their performance against humans by using zero time increments. Many
won't play other computers, I have no idea why this is. Since most of
the computers doing this are not by the original authors but by Crafty
clones, you cannot argue that you are interested in studying the games to
learn how to play against humans. So I can only assume it is done
in an attempt to maximize the rating of the machines. I have serious
doubts that this has any impact on another computers rating though,
it's based on the illogical assumption that a computer with a given
rating is more likely to beat you than a human with the same rating.
This indicates a lack of understanding of how the rating system works.
I suspect that part of the motivation of the Crafty cloners is that
they like to identify with a winner, even if they themselves are not,
especially in the case of those that won't play other computers, there
is more thrill in beating a human I guess.
I let Occam play anyone (computers or humans) as long as they are not
more than 300 weaker than Occam. I am probaably going to relax this
restriction even more. With the linear rating formula they use, there
is absolutely no motivation to play anyone a lot weaker than you,
even though you are almost certain to win, you win zero points and
this is a terrible deal. The correct formula (at least in principal)
makes it not matter who you play although most people are superstitious
about things like this.
About spiking your ratings with dirty tricks:
Very easy to do. Play humans with 3-0 time control to start with. You
can play computers with any time control. Observe all the opponents you
play and keep track of their ratings. Play them only when they are on
the upper end of their own personal range. In fact if you really wanted
to get fancy, predict your probability of beating each opponent based
on past experience, and only play against them when the expectancy of
winning rating points is positive. This is very dirty, but will inflate
you significantly over time. If you use this technique you don't care
which time controls you use, as long as you keep up with the programs
results for each time control. It's always worth playing a particular
match if the expectancy is positive of winning rating points. It doesn't
matter who you play or what the time control is.
- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.