Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Does Analysing database BT-2630 or BS-2830 give correct Elo rating?

Author: Bruce Moreland

Date: 10:07:04 03/30/98

Go up one level in this thread



On March 30, 1998 at 08:01:13, Eran wrote:

>I am a little bit confused with using Analyse database menu with BT-2630
>or BS-2830 in Rebel9. I want to know the accurate Elo rating of my
>computer using Rebel9, and I am not sure if using Analyse database menu
>with BT-2630 or BS-2830 will give correct Elo rating for my computer.
>
>Does Analysing database BT-2630 or BS-2830 give correct Elo rating? Can
>I go ahead and use it easily? Will a new Analysis.dat database tell me
>about the accurate Elo rating for my computer using Rebel9?

I don't know exactly how the BT and BS tests were produced, but I think
that they were probably designed to return specific numbers.

You take a list of programs and their ratings (from the SSDF list), make
your test, run your test with these programs, then perform some math on
the solution times in order to produce an Elo rating.

If the ratings don't correspond to your original list of ratings, you
modify the test and/or modify the math, and try again.

So for a fixed set of programs running on specific hardware, you get the
numbers that you want.  There is no measurement of strength here, you've
just created a more complicated way to get a number that you already
think you know.

Things get a little random when you change the hardware, or add a new
program, but because programs are simiilar, and because the math is
designed to reward faster solution times due to faster hardware, the
rating of a new program will probably be about the same as the ratings
of the old programs, and the rating of one of the "tested" programs,
running on faster hardware, will be higher than its old rating by a
satisfying amount.

This might conform to reality by something other than a coincidence, but
it doesn't necessarily *have* to.

It's unfair to the old programs, since they get the ratings they are
"supposed" to get.  It's unfair to the new programs, since they may
return a number that is not similar to their Swedish number.  And when
you start running on hardware that is five times faster than what was
used originally, who knows what the math will do to you.

And when someone makes a new version of their program, they have the
opportunity to tune against the old test, so the new version gets a
higher rating assigned to it.  This is not necessarily "cheating".  The
test positions are interesting, and they give you ideas, and
implementing the ideas coincidentally makes you do better on the
positions.

So what I think you are getting when you run one of these suites is a
number that has been determined for you by the test suite designers, in
consultation with the Swedish list, with possible later messing around
by the program author, and with some fluctuation due to variations in
hardware.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.