Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF validation proposal

Author: Peter Fendrich

Date: 15:45:20 01/08/00

Go up one level in this thread


On January 06, 2000 at 23:27:31, Robert Hyatt wrote:

>On January 06, 2000 at 20:24:38, Peter Fendrich wrote:
>
>>On January 05, 2000 at 22:36:20, Robert Hyatt wrote:
>>
>>>On January 05, 2000 at 18:47:14, Peter Fendrich wrote:
>>>
>>>>On January 05, 2000 at 16:32:12, Robert Hyatt wrote:
>>>>
>>>>>On January 05, 2000 at 14:47:11, Chris Carson wrote:
>>>>>
>>>>>>In my opinion SSDF does not need more external
>>>>>>validation (some human games are included in the ratings).
>>>>>
>>>>>this is not correct.  human ratings were used (IIRC) maybe up to 1993 at
>>>>>the latest.  7 years washes _all_ the 'humaness' out of the SSDF rating
>>>>>pool, since they have played thousands of games since the last human game
>>>>>was included.
>>>>
>>>>The human results used are not washed out more today than 1993. The results are
>>>>still used in the same way with the same impact as then. The ordinary K-constant
>>>>doesn't apply here. The human results are only used to adjust the level of the
>>>>list.
>>>>
>>>>That doesn't say it is a human list in any way. And there are problems:
>>>>a) It is far too few games between humans and chess programs.
>>>>b) Are games played 10 years ago still giving the same information?
>>>>Probably not, because of the increased knowledge about of how to play computers.
>>>>I would think that humans are much more prepared for the computer style today
>>>>than 10 year ago.
>>>>
>>>>As a pure program vs program rating list, giving the differences between chess
>>>>programs, it is very accurate ratings IMO. The adjustment to human levels
>>>>however are only helping us to get a rough estimate of how to compare these
>>>>ratings to human ratings.
>>>>//Peter
>>>
>>>
>>>Sorry, but I disagree.  The human ratings were against programs that are over 7
>>>years old.  Since then it has been _only_ computer vs computer... with no humans
>>>in the pot to influence the pool.  You have nearly 7 generations of programs
>>>that have played each other since the last human vs program game was included
>>>for rating purposes...
>>
>>I'm not sure where we disagree but I think that there is some misunderstanding
>>about how the human games are used. Despite the age of these games they still
>>have "the same" influence on the rating list as back in -93.
>
>
>Not exactly.  The ratings were adjusted to correlate with some human vs computer
>results that were known.  But that set the ratings for a couple of programs back
>in 1992-93.  Since that time, thousands of new games, computer vs computer were
>played.  And as I mentioned, a small change to a program can produce a wide
>gap in Elo performance.  With no checks and balances to keep the list in line
>with the old Swedish federation ratings.  The ratings were never calibrated to
>FIDE ratings, and after 7 years, there is little doubt that they have drifted
>over 200 points above FIDE, at least.
>
>At least I don't believe any program is a 2700 player, on today's hardware,
>which puts them in the top 10 or so of the worlds best players.  I don't believe
>they are in the top 100 yet...
>
>>
>>>
>>>IE the computer rating pool was 'seeded' with human ratings, but then the
>>>two pools became 100% disjoint.  Ratings today have _nothing_ to do with FIDE,
>>>or any other rating pool...
>>
>>The two pools were disjoint from the very start and still are. The order between
>>members in the list and the differences between the ratings are accurate. The
>>absolute rating figures, however, are hard to compare with other rating lists
>>and that goes for all other rating pools as well. That's why I call it a rough
>>adjustment to the human levels.
>>
>>//Peter
>
>
>Right...  but we probably don't agree on the adjustment.  I say SSDF-200 is
>an _upper_ bound on the program ratings, when comparing them to FIDE ratings.
>200 might be too small, but probably not by a lot.

No we don't agree here. But we have also reached the point of speculations and
my opinion is not as strong as maybe yours here.

The results from the yearly Aegon tournament in Holland does support the SSDF
level. Unfortunately this tournament has ceased to run.
I think that the Rebel challenges at least doesn't contradict the SSDF level
supposing that Rebel is about even wíth the top programs.
So there are some results supporting the SSDF level and I know that your
experiences from mainly ICC is not supporting it. The ICC GM's are highly
specialized on computer chess more than is the general state and that is
possibly biasing the the results.
As said before, my opinion about this issue is not very strong - I would like to
see more results from human-computer games in tournament conditions and in
different kind of events.
//Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.