Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list

Author: Rolf Tueschen

Date: 14:19:19 05/25/02

Go up one level in this thread


On May 25, 2002 at 16:00:01, Peter Fendrich wrote:

>You and some other guys keep saying this ("Nothing") but that is not a good
>interpretation of the list.
>The ratings are the best predictions of each engines strength. Fritz is still
>the best prediction of who is the best even if it's a quite unsafe prediction.

This is absolutely false. And at the first glance it's both true and false.

- it's true because now, at the moment of the cut of the information stream
FRITZ was 8 points above the second (but the moment of the cut is important;
without further information we cannot judge whether the cut was good or bad)

- it's false, because (and this is trivial) with 8 points advance and a margin
of error of 30 points all could happen in future; either FRITZ on place 1 or
place two, even place 5. It is absolutely false that the first place at the
deliberate moment of the cut has any predictive power more than for place two at
the moment of the cut and a future of place 1 or 5.

- it's absolutely false overall, because we have _no_ information about the
future. Therefore Sandro Necchi and all critics of SSDF are right. And nobody
even didn't start to talk about different hardware, different samples of
opponents, and the validation of the data with human chessplayers for the
meaning of the Elo numbers.


>The current error margin just says tell us that we can't be 95% sure. Lower the
>expectations of probability and the error margin intervals will shrink.
>
>Peter

THe presentation of the SSDF ranking list tells us, that although the SSDF
defense is always hinting at the no-science argument, but still the list is made
to inspire the fantasy in the clients of a scientific project because of the
sophisticated margins and probabilities. The critic however discovers that SSDF
does not obey the simplest rule of experiments namely the control of the
variables and the holding them constant, to be able to get as a result the Elo
numbers of the rating. If all is flexible, you'll never know what your results
should stand for. That you and the SSDF has no bad feelings is simply a result
of your own expectations. As long as the results "look" like normal you think
that your test design must be ok. But chess testing it's also known that a
result can look ok for the wrong reasons.

Rolf Tueschen



This page took 0.05 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.