Author: Mogens Larsen
Date: 04:15:14 05/10/00
Go up one level in this thread
On May 09, 2000 at 21:57:02, Eelco de Groot wrote: > >Hello Mogens, Hello Eelco, >I don't want to restart this whole dicussion and I haven't followed more than >half of it but I think some of your criticisms were a little harsh. That is probably true. But if you read some of my posts you'll see that a lot of the so called critiscism is formulated as questions. Most of the questions were either adressed superficially or not at all by the tester. That made my remarks and questions more rash and harsh than intended, which is indeed unfortunate. If feelings got trampled on I'm truly sorry, but I believe that most of my remarks were justified. I also believe that the study could have been conducted much better considering the unique hardware available for Chessfun. >In practically any experiment there are disturbing influences and I think there >were some here too. The biggest influence I could see, one that possibly could >have been avoided is that in the beginning some matches were played with >booklearning on. If I am mistaken here I hope that somebody can correct me. I >know for Rebel that booklearning can be disabled, for Crafty this can be done >with the command learn=0. I don't know exactly if those commands can be used in >the Hiarcs interface or in winboard for Crafty and if booklearning can be >disabled for Fritz 6a too but especially for a repeated Nunn-like test it would >be desirable. Okay, I think that is clear. There are more questions of a similar nature, but they were not adressed as well. Autoplayer is the main culprit in my opinion, especially if you want to compare ponder on with ponder off. >Apart from that I think using the Nunn positions was a good idea from Chessfun, >if the object was to see how a. timecontrol or b. pondering on one or two >machines affects the strength of an engine combined with use of the timing >algorithms involved. I think any not too imbalanced early middlegame position >could be used for these experiments if each engine gets to play both colours. In >practice of course also the opening books affect the strength of a program (as >opposed to engine) but since bookmoves can be played very fast just starting >from a Nunn-position does not make much difference for the timing algorithm. The >big down side I see in using books and learners is that the books also have a >big randomizing effect on the results and secondly if the two learners in a >match don't cancel each other out that can mean that the results don't stabilize >even after large numbers of games. They are just big noise generators if you >want to look at the effect of pondering or timecontrol. Even if you would >consider both books and both learners equally good you need much more games to >determine differences in engine strength this way. Nunn positions are okay for testing I guess, but can't be used for accurate strength assessment IMO. They might favor one engine over another or they might not. A common book, or a special book for each engine, is to be preferred, since it's the strength of the program we're interested in, which is why learning should be included as well. Nunn positions aren't the ten commandments, nor are they written in stone :o). Try looking at the Nunn tests at Chessfuns site and reassure me that they are better. >Comparing tests on one machine with results on two machines is interesting too >to see if there is a discernible effect of how limited resources get divided >etc. That's true if you got control over the other parameters. I'm not entirely sure if that was the case. >There could have been some influence from using other programs on just the >Windows 98 computer but I don't think that can have had much influence on the >results. I think it is easy to test how much a program gets slowed down if you >use it with other programs running, by looking at times needed to reach a >certain plydepth. I don't think it's that important either, assuming that Chessfuns measurements are correct, but they should have been mentioned and their relevance examined. I'm a little disappointed that the testing stopped and I think I'm to blame for that. So if Chessfun decides to resume her testing, I promise not to make remarks about their validity. Sincerely, Mogens
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.