Author: Bruce Moreland
Date: 14:49:20 02/23/00
Go up one level in this thread
On February 23, 2000 at 15:01:14, Bertil Eklund wrote: >On February 23, 2000 at 12:33:50, Bruce Moreland wrote: > >>On February 23, 2000 at 11:08:43, blass uri wrote: >> >>>shredder2 was not tested on the fast hardware because the ssdf always use fast >>>hardware for new programs and old hardware for old programs. >> >>Has anyone considered that this might be a major source of error, perhaps rating >>inflation? > >Why? Any suggestions of what to do instead. Do you think humans should refuse to >play opponents rated 200 elo higher or lower. There is a very major assumption buried in the Swedish list, the assumption that these ratings have some correlation with ratings on the human list. The very best way to make the ratings correlate with the human list would be to have the programs play against a variety of humans. Instead the games are played exclusively between machines. If you speed up a program's hardware, the program will become stronger against other computers, you have ample evidence of this. But it is not a foregone conclusion that the program will become the same amount stronger against humans. The programs don't differ that much from each other, and it is possible that when you increase hardware, you allow the faster player to superset the slower one. It sees the same stuff, just better and faster. What is the result of this? I don't know, but it is possible that it is more extreme than should be expected. Against humans, it could still be missing the same stuff that it missed before, if it has a problem with long-term strategic issues. The humans score against the programs differently. They have different strengths and weaknesses, and hardware increases may not increase the computer's strengths significantly or reduce their weaknesses. >>"Always" is a bad word to use when you are trying to get an accurate result from >>a system that is designed to produce accurate ratings within a pool where >>everyone plays everyone under the same average conditions. > >Have you ever thought about that the human pool works in the same way, except >for being much bigger? Humans play against other humans, and there is little interest in correlating the human ratings with computers, whereas there is much interest in doing the reverse. People want to take the ratings on the SSDF list and compare them with human ratings. Since the two pools have been distinct for a long time, it is possible that they have evolved apart. If you don't measure this, since the numbers seem to "look" right, nothing is being proven in this respect. A mathematical process that is used to measure and predict must have some basis other than that it seems to do OK so far based upon our own feelings about the issue. >There is a slight inflation because some older programs have no >learning-function. The newer ones have private access to the older ones, and not vice versa, but this is another issue. >Of course the pool should be calibrated but not because of someones gut-feelings >or wild guesses. If someone wants to publish a scientifically valid list, they should expect questions from outsiders about the methods used, and react to them a little better than this, I think. Perhaps I am wrong, but I have pointed out what I think is a potential problem with your list. You can ignore me if you wish, I suffer not at all if you do so. bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.