Author: Ricardo Gibert
Date: 08:03:15 06/23/99
Go up one level in this thread
I have a problem with these engine vs engine testing that get posted here. I'm not trying to single you out on this. You just happen to be the lastest such post. The problem I have is: people draw the misleading conclusions from such testing. The misleading conclusion is that a "favorable" record of "this" program over "that" program "implies" that "this" program is stronger than "that" one. To take a couple of examples among humans, should we conclude that M. Tal was a stronger player that Fischer, because he had a +4 =5 -2 record against him? Was E. Geller a stronger player than Fischer, because he had a +5 =2 -4 record against him? Am I a stronger player than Jack Peters, because I have a +2 =0 -0 record him? Peters outrates me by 300pts! I'm not even close! Practically everyone agrees Fischer was the stronger player! These scores reflect more about luck and the relative idiosynchrosies of the respective players than anything else. In engine vs engine testing I think the idiosychrasies play an even greater part in the results. The point is, how an individual (program) performs against another individual (program) is a very poor benchmark of relative strength. A better indicator is how well an individual performs against the general population of players. After about a hundred games we can then draw a much more reasonable conclusion about relative strength. A large number of games against varied opposition is much more reliable. People tend to draw the wrong conclusions from such "testing". What I want out of a chess playing program is a strong opponent that effectively helps me improve and can be reliably used to analyze. To this end, a much better way to determine a programs strength would be to have it play on ICC about 1 hundred games against titled players at a slow time control. Compute the average rating held by the computer and the standard deviation. Then you can get an idea of the programs strength. To me the recent paderborn tournament does not tell you much of anything about how strong the programs were. Not enough games. Unequal hardware. Opponents were not human. When you purchase a chess playing program, your primary interest is NOT how well it will do against other programs, but rather how well it will play against YOU in practice, how useful it is in preparing you to play your HUMAN opponent in your next tounament and how much you can learn from it. The program that does best against strong human opponents is your best bet. Having said all that I don't mean to imply that engine vs engine testing is without value. The results are not without interest and independent significance. I can imagine performing such testing can be quite fun. Perhaps akin to playing fantasy baseball. Moreover, despite what I said about Paderborn, I followed the event very closely. The games were very interesting and I drew my own conclusions about how computers play, but I have no idea which is the stronger program. I have a better idea of which program had the strongest hardware! As for what is a reasonable motive for the participants for playingin Pderborn: 1) It pays to advertise (especially when luck is on your side) 2) "Because it's THERE" (like Mount Everest) 3) "My boss told me to go"
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.