Author: Stephen A. Boak
Date: 12:19:22 12/24/99
Go up one level in this thread
On December 24, 1999 at 05:48:58, John Warfield wrote: > > > Since these games are 40/2 Will this count in perdicting Rebels rating? I >assume that It will count. Hi John, a bit of philosophy here (rhetorical questions, no need to answer)-- To whom do you address your question? Who can/should determine what will 'count'? Anybody can calculate (or predict) SSDF, FIDE, USCF, ELO-type, or TPR ratings, using whatever data they have available, for any chess program. Just get the test data (including opponent ratings, conditions and results), grab a pencil and some paper, or computer or calculator, and do the mathematics. The real issue, however, is not the math calculation itself. It is the issue you have raised--what should be counted (included) in the calculation. To be believable, the calculator should show the basis for the calculation, and it is up to the reader to accept or reject, believe or disbelieve, doubt or not doubt the published calculations. To be believable, the basis for the calculation should be believable (trustworthy). That is the underlying games, conditions and results should be as free from error and controversy as possible. Otherwise the published results and hence the calculations of rating may be in doubt. Just like science--there are many experimenters, there are many published results of experiments, and there are often many differing opinions on the interpretation of results. When one interpreter points out a testing flaw, or uncontrolled variable that might have affected the results of another person's published results of experimenting, this may raise doubts about the meaning or usefulness or accuracy of the published results. In the same way, there may be some doubts about some calculations of ratings, but as long as the basis for the calculation is shown by the calculator, all are free to interpret the calculation results for themselves. One can calculate the Russek match results in their own assessment of a Rebel rating, if they so choose. My preference is to calculate ratings for Rebel products based on open testing and verifiable results published by Ed Schroder and the Rebel Team. Ed has the highest motivation (Rebel programs are his business) to ensure his testing is thorough and will stand up to public examination. He offers prize money to FIDE-rated GMs and IMs to ensure they are motivated to give their best effort to play his program. Ed releases the testing conditions in advance, for comment by the world. His games are shown 'live' on ICC to the world, so all can see the moves of the games in his testing. A trusted arbiter (good reputation) is at the scene of the human player to ensure fair play. His titled opponents are free to make their remarks after the game, to the world, about the game and the play of Rebel software. All viewers of the 'live' game on ICC may make observe and comments. Ed utilizes the fastest and best hardware he can afford and trust. He makes sure his settings are those he believes are best for the particular test he undertakes (example--he determines if Anti-GM feature is switched on or off, depending on his latest assessment of his own program). Although this (Anti-GM) may be a test variable that changes from game to game, Ed choses which variables he wants to test in his experiments and which he believes will allow his program to play the best it can play--I have no problem with this, as long as it is disclosed and verifiable. The trusted arbiter has a copy of the program to verify all moves were made by the software Ed claims he used to make the moves. Owners of the same software (or relatively recent, similar version) can verify moves to a large degree using their copy of the Rebel software. As long as the Rebel Team continues to disclose its verifiable testing conditions and as long as those conditions seem trustworthy to me, I will continue to use those results in my calculations. Maybe I missed a posting, but I don't know what prize money, if any, is offered to IM Russek, to ensure his motivation is there for the upcoming match with Rebel. In my mind, this aspect is an example of one of several things that cause me to lean toward use of results published by Rebel Team and not others. Certainly we should not always trust every vendor of a product that makes claims about their software--that goes to the trustworthiness issue and the old saying 'consider the source' and 'let the buyer beware'. However, Ed and the Rebel Team have not made any outlandish claims to my knowledge, they have not tried to inflate their evidence to show Rebel is super-GM in strength (say, over 2600 FIDE ELO equivalent). They have established a fine series of tests, under very conrolled circumstances, verifiable to a highly reasonable degree, and have let the chips fall where they may (win, lose or draw). Their experiments seem to show Rebel is about 2500 FIDE ELO strength at this time (my calculations using their published results), although more games will perhaps strengthen both the believability of the calculation and of the ELO estimate itself if it changes after more games have been played. And I expect Ed to continue to improve his Rebel program, so it may even grow stronger as he makes continuing programming additions or changes. Time, and good experiments, will tell. --Steve Boak
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.