Author: Steffen Jakob
Date: 05:38:17 05/30/01
Go up one level in this thread
On May 30, 2001 at 06:41:46, Bas Hamstra wrote: >Yes, determining if you did improve your program is problematic. Interesting >post. But probably even if you use the below method you will find mixed results. >It has a better FH% and a worse branching factor. It does better against A but >worse against B. Sronger at lightning and weaker at long tc's. It plays stronger >at the server though it does worse on the suites. Etc. Apart from that it takes >a long time to play for example 100 standard games. You can't just play 3 games. >Sorry if I sound a little pessimistic :-) That isn´t pessimistic. Mixed results are ok. Its then my interpretation if I like it or not. But at least I have some data. Greetings, Steffen. >Bas. > > >On May 30, 2001 at 05:57:54, Steffen Jakob wrote: > >>Hi all, >> >>it seems as if the last modifitication I made made Hossa play >>worse. What I need are some good methods for testing. At the moment my >>testing is rather chaotic. I observe games at ICC and if I see >>something strange I try to fix it. Then I let Hossa play again at ICC >>and look how it works. "to see how it works" is surely not a good way >>for testing and the ICC rating isnt also reliable. >> >>I know from others that they do basically the same. I am not happy >>with this at all. I rather would like to see a well defined strategy >>how to test changes automatically. Here is some short brain storming >>and I would like to get some feedback. >> >>- test suites (I dont like test suites very much for testing, because >> those positions are mostly very tactically and rather >> exceptional). There should be used different test suites which >> emphasize different themes (tactics, endgames, pawn endgames, >> exchange sacs, ...) >> data to compare: #solutions, #nodes, time >> >>- static eval tests: I think of a set of positions where I dont look >> for a best move but for a static eval value. An eval range would be >> assigned to each position and if the engine´s static eval is within >> this range, then it matches. >> data to compare: #solutions >> >>- static eval order: this is similar to the point above. Here I want >> to specify a set of ordered positions. The "best" position is >> ordered first etc.. This point is interesting. Here you can test >> if the engine prefers certain patterns to other patterns. >> data to compare: #solutions >> >>- effective branching factor test: given is a set of "typical" >> positions for opening, middlegame, endgame. The engine computes each >> position a certain time and writes down the effective branching >> factor. >> data to compare: effective branching factors >> >>- % fail highs in first move: similar as above. For the same set of >> "typical" positions the percentage of 1st move fail highs is measured. >> data to compare: % >> >>- self games: a reasonable number of games is played vs older versions >> of the same program. Which openings? learning? Maybe the openings >> should be given so that the results are better comparable. >> data to compare: score for different time controls >> >>- matches vs other programs: similar as above. a reasonable number of >> games is played vs other programs. >> data to compare: score for different time controls >> >> >>All tests can be done automatically and produce good results which can >>be compared directly. >> >>Greetings, >>Steffen.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.