Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rules of thumb for strength estimation?

Author: Peter Fendrich

Date: 14:19:10 04/25/03

Go up one level in this thread


On April 25, 2003 at 14:47:17, Tim Foden wrote:

>On April 25, 2003 at 14:24:23, Peter Fendrich wrote:
>
>>On April 25, 2003 at 06:13:51, Tim Foden wrote:
>>
>>>On April 25, 2003 at 04:20:33, Albert Bertilsson wrote:
>>>
>>>>Hi!
>>>>
>>>>When testing my new engine versions against older versions I use the nice
>>>>WhoIsBetter tool to determine weather or not the new version is likely to be
>>>>stronger.
>>>>
>>>>But I would also like to know how much stronger. Just an estimation would be
>>>>nice, just as an reward for the work. Putting the engine on FICS takes to long,
>>>>so I wonder are there any rules of thumb that I can apply?
>>>>
>>>>Like:
>>>>New engine scores 3 to 2?
>>>
>>> 60% = +70.44 ELO.
>>>
>>>>New engine scores 2 to 1?
>>>
>>> 66.66% = +120.36 ELO.
>>>
>>>>New engine scores 3 to 1?
>>>
>>> 75% = +190.85 ELO.
>>>>
>>>>Regards Albert
>>>
>>>Dann Corbit has a tool called USCF which can calculate such numbers.  I have a
>>>modified version here.
>>>
>>>Here is a table of outputs which may be useful:
>>>
>>>A win percentage of 50% gives a rating difference of +0.00 ELO
>>>A win percentage of 55% gives a rating difference of +34.86 ELO
>>>A win percentage of 60% gives a rating difference of +70.44 ELO
>>>A win percentage of 65% gives a rating difference of +107.54 ELO
>>>A win percentage of 70% gives a rating difference of +147.19 ELO
>>>A win percentage of 75% gives a rating difference of +190.85 ELO
>>>A win percentage of 80% gives a rating difference of +240.82 ELO
>>>A win percentage of 85% gives a rating difference of +301.33 ELO
>>>A win percentage of 90% gives a rating difference of +381.70 ELO
>>>
>>>Cheers, Tim.
>>
>>One can't use this table or ELOSTAT or any other ELO rating formula.
>>It will produce a figure but it doesn't mean anything.
>
>:)  I know what you mean... but to be pedantic, I think you really mean
>shouldn't rather than can't.  I.E. I can perfectly easily use this table to make
>predictions about changes in strengh when there are few games.  Thus I
>demonstably _can_ do it.  But I agree that I shouldn't really.  And in fact I
>don't.  :)
>
>>1) The ELO formula is based on the "Normal distribution" which is just an
>>estimate of the real distribution.
>
>OK.
>
>>In order to be used as an estimate you need
>>something about 30-50 games or more.
>
>Again, I disagree.  :)  It _can_ be used as an "estimate" however many games you
>have... it just won't be a very accurate one. :)
>
>>2) Even if it was a perfect estimate the few games gives a very instable figure.
>>For instance the difference between 2-2 and 2.5-1.5 gives a big difference in
>>ELO but represent a very small difference in the results.
>
>Very true.
>
>Cheers, Tim.

Well Tim, I wont argue against that but _can't_ here was a 'sloppy' way to say
_can't be done properly_ or something like that. I don't know if this makes any
sense in English though...
/Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.