Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Value of playing different versions of a program against each other

Author: Uri Blass

Date: 07:24:55 01/07/03

Go up one level in this thread


On January 07, 2003 at 10:17:03, Lieven Clarisse wrote:

>On January 06, 2003 at 18:24:31, Dann Corbit wrote:
>
>>On January 06, 2003 at 17:40:53, Lieven Clarisse wrote:
>>
>>>On January 06, 2003 at 16:56:35, Tom King wrote:
>>>
>>>>Hi all,
>>>>
>>>>What do people think about playing different versions of your program against
>>>>each other as a way of testing?
>>>>
>>>>I'm playing around with it right now, between v0.07 and a newer version of my
>>>>program. The newer version is winning handsomely: +24,=18,-10.
>>>>
>>>>This implies a reasonably impressive increase in strength, almost 100 ELO. Ok,
>>>>ok, it's a small sample, so the margin of error could be big.
>>>>
>>>>However, my gut feel is that playing different versions of your programs tends
>>>>to overstate the strength differences. What do people think?
>>>>
>>>
>>>The best way IMHO is to test it against engines with more or less equal
>>>strenght. You can use WBEC ratinglist to get an idea of the strenghts of the
>>>different engines. Try to find a range were your program gets %50 score (for
>>>instance range 40-50 from the ratinglist, when changing your program and see you
>>>get >60% (for a sufficient large numbere of games) it is time to play against
>>>the range 35-45, etc.. Strength is best measured when playing equal opps.
>>
>>I disagree.  I think you get better results from two sets:
>>1.  Programs that are about 100 ELO weaker.
>>2.  Programs that are about 100 ELO stronger.
>>
>>When the programs are about the same strength, you get too much coin toss
>>effect.
>
>I have to disagree, the larger the ELO difference, the larger the marge of
>error, ie the more games you have to play to now the ELO difference.
>
>Say your engine has 1500 elo: you lose 10 games against ruffian? you have not
>have any information about the engines strenghts, you know it is significantly
>weaker, but how much?
>
>At FICS, your RD decreases most if you play EQUAL opponents. I don't see why
>testing it against engines with +/- 100 ELO would be better; the further you go
>away from it's ELO the larger the amount of lotery involved. If you play 5 games
>against an engine that has 100 points more than your engine, it can well be that
>you lose them all. Tiny improvements will be best reflected if you play your own
>strenght: say in 100 blitz games going from 50% win to 55% win.
>If you play against higher opponents, the difference will be smaller; and harder
>to see...

I agree with Dann
Suppose that you have a change that cause your program to be more unstable in
it's level.

If you test against equal programs you will not discover it.
If you test and find that you perform relatively better against
stronger programs and relatively worse against weaker programs you may find it.

Uri



This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.