Subject: Re: Value of playing different versions of a program against each other

Author: Tom King

Date: 14:22:16 01/06/03

On January 06, 2003 at 17:03:01, Dann Corbit wrote:

>On January 06, 2003 at 16:56:35, Tom King wrote:
>>Hi all,
>>What do people think about playing different versions of your program against
>>each other as a way of testing?
>>I'm playing around with it right now, between v0.07 and a newer version of my
>>program. The newer version is winning handsomely: +24,=18,-10.
>>This implies a reasonably impressive increase in strength, almost 100 ELO. Ok,
>>ok, it's a small sample, so the margin of error could be big.
>>However, my gut feel is that playing different versions of your programs tends
>>to overstate the strength differences. What do people think?
>That test demonstrates exactly what it measures:
>Win expectancy against previous versions of your own program.
>If you want to know win expectancy against other programs, you will have to test
>it separately.
>On the other hand, there is probably going to be some correlation between your
>new program clubbing the old ones and how it fares against other programs.  On
>the other hand, you won't have any idea what the correlation is until you test

I will do some testing against other opponents soon, and I expect my changes to
be more or less a wash.. We'll see. That's the weird thing about playing
different versions of programs. You don't want to get too excited if version X
hammers version Y. What you want is to see how X and Y do against a range of
opponents at different time controls. If X does significantly better than Y
against different opponents, *that's* the time to get excited.


