Author: Richard Pijl
Date: 03:22:19 01/08/03
Go up one level in this thread
On January 07, 2003 at 10:17:03, Lieven Clarisse wrote: >On January 06, 2003 at 18:24:31, Dann Corbit wrote: > >>On January 06, 2003 at 17:40:53, Lieven Clarisse wrote: >> >>>On January 06, 2003 at 16:56:35, Tom King wrote: >>> >>>>Hi all, >>>> >>>>What do people think about playing different versions of your program against >>>>each other as a way of testing? >>>> >>>>I'm playing around with it right now, between v0.07 and a newer version of my >>>>program. The newer version is winning handsomely: +24,=18,-10. >>>> >>>>This implies a reasonably impressive increase in strength, almost 100 ELO. Ok, >>>>ok, it's a small sample, so the margin of error could be big. >>>> >>>>However, my gut feel is that playing different versions of your programs tends >>>>to overstate the strength differences. What do people think? >>>> >>> >>>The best way IMHO is to test it against engines with more or less equal >>>strenght. You can use WBEC ratinglist to get an idea of the strenghts of the >>>different engines. Try to find a range were your program gets %50 score (for >>>instance range 40-50 from the ratinglist, when changing your program and see you >>>get >60% (for a sufficient large numbere of games) it is time to play against >>>the range 35-45, etc.. Strength is best measured when playing equal opps. >> >>I disagree. I think you get better results from two sets: >>1. Programs that are about 100 ELO weaker. >>2. Programs that are about 100 ELO stronger. >> >>When the programs are about the same strength, you get too much coin toss >>effect. > >I have to disagree, the larger the ELO difference, the larger the marge of >error, ie the more games you have to play to now the ELO difference. > >Say your engine has 1500 elo: you lose 10 games against ruffian? you have not >have any information about the engines strenghts, you know it is significantly >weaker, but how much? > >At FICS, your RD decreases most if you play EQUAL opponents. I don't see why >testing it against engines with +/- 100 ELO would be better; the further you go >away from it's ELO the larger the amount of lotery involved. If you play 5 games >against an engine that has 100 points more than your engine, it can well be that >you lose them all. Tiny improvements will be best reflected if you play your own >strenght: say in 100 blitz games going from 50% win to 55% win. >If you play against higher opponents, the difference will be smaller; and harder >to see... If you have an error it is more likely that you will notice it when playing against a weaker program. I always include both stronger and weaker programs (up to 300-400 ELO weaker) when testing the Baron. When Baron loses against a weak program, I know that it is either a bad opening line (which should be corrected) or a bug/weakness in my program. For this it is of course important to chose stable programs (in performance) to test against. Richard.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.