Author: Bruce Moreland
Date: 17:54:29 05/22/01
Go up one level in this thread
On May 22, 2001 at 05:03:14, Ratko V Tomic wrote: >> If you tune your TV to an unused channel and watch the >> static, your brain will create meaning. > >You are not trying to hint that the spiritual masters from >Sirius are not really sending the encoded messages in the >TV screen noise? :) > >Actually, your brain will create a meaning out of some >noisy electric signals caused by photons striking your >retina. You may think of it as the "real-world," but it is >just some private little toy model made up by your brain >as it correlates all those seemingly chaotic electric pulses >(i.e. any meaning is a "created meaning"). > >> Since they are designed to produce a specific result, it's >> hard for me to understand why people praise any suite's >> predictive powers. > >They do produce ratings within hours, which may take months >for SSDF/ELO method. So, they're a perfectly valid model for >predicting SSDF list (whether that is important is another >matter). They predict the SSDF rating list just fine. So would a function that simply spit out the appropriate rating based upon the input "Rebel" or "Tiger" or what have you. It would also be possible to create a test suite the orders the programs the opposite of the way they are on the SSDF list, or one that puts Crafty on top by 100 points, or one that rates Fritz at 1450. If I know the list beforehand, and I have the programs on the list, and it's my goal to create a program that rates the programs the way they are on the list, clearly I can do it. It's just a matter of fiddling around. We saw what happened with one suite a year or two ago. Someone created a suite based upon several programs, but they didn't include Crafty. Someone tested it with Crafty and Crafty scored 2900 or something. It's not enough that the test produces the SSDF list. There has to be a reason it produces the SSDF list. It has to produce the SSDF list even if you improve the hardware. And it has to produce the SSDF list even if you add new programs to the list. The question is whether a suite can do these two things. The fact that tactical performance is a large component of chess strength might have to do with position on the SSDF list, but not necessarily. A program has to have a good book, (these days) a good learning facility, good positional play against computers, good ending, etc. If a simple tactical suite proposes to predict all of these factors, clearly something is wrong. >The nontrivial/significant aspect of the suites is that >there _exist at all_ so microscopically small sub-sets of all >the chess positions which can statistically predict program's >performance in real games. That is, the suites extract much >more information from tiny samples of positions, far beyond >what a coin tossing model (the SSDF method) would from similar >size samples of positions. After a few dozen positions in a >single game, the coin tossing model (regular ELO) predicts >almost nothing about the future performance of a program, >while a good suite will produce predictions (after its few >dozens evaluations) which the coin tossing model will take >3+ orders of magnitude more positions to match for accuracy. > > >> It's like a magician who puts a bunch of colored balls in a >> bunch of colored boxes, then covers his eyes and tells you >> which color box has which color ball inside it. >> Of course he'd know which box contains which ball. >> He's the one who put them in boxes. > >You could argue the same way against any physics experiment which >produces predicted results. Indeed, the physicst arranges the >experimental set-up just right to make it follow the desired behavior. I believe that I have a valid objection against pre-ordained test suite results, but I don't believe that I've destroyed all of science. If a scientist keeps running experiments until he gets the result he wants, while in the mean time discarding experiments that produce the opposite result, I agree that this is similar, but a scientist shouldn't be doing that. >The essential aspect is that in order to actually "fix" such "rigged" >arrangement, the physicist uses theoretical models which >are much more economical to run through than the real thing (the >actual reality run on the full set-up). That is the point of the >models, like the toy aircraft one runs in the wind tunnel or on >a computer simulator -- they're much cheaper and less risky to >run than the full scale reality run. And for the magician, yes, >he needs the economical and effective models of human perception >and resoning loopholes to rig his set-up, too. There has to be some correlation between the test and reality. Otherwise, I could just write a function that takes the name of the program, its hardware, and its version, and outputs a rating. The function could at its simplest be a simple table lookup. This function has perfect predictive powers for the programs it is tested with, but it is completely useless. More complicated versions of this function are also completely useless. What I am arguing is that it is obviously possible to make a complicated version of this function, which uses actual solution times produced by the programs, which is also completely useless. The degree to which the current suites do this is open to debate, but the possibility should absolutely not be overlooked. >The same goes for the test suites or a careful analysis of a single/few >games and the program evaluation outputs. Without relying on such >economy of efforts, the models (or seeing patterns in apparent >randomness), one could never improve a chess program (e.g. if >one had to wait months for thousands of games, where only the >final outcome, i.e. 1.58 bits/game is extracted, in order to >find out whether any given change strenghtens the program). I believe that in many cases, programs are improved through intuition. A programmer will see a situation that isn't handled properly, and will apply a tweak to get the program to do the right thing. The assumption is made that the improvement will work in general situations. I think that very rarely is a positional or significant tactical change is proven to be better. If you can do the same thing in less time, you've almost certainly made an improvement. But if you increase your doubled pawn penalty, there's no way you can tell for sure that this did anything positive. bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.