Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: ELO inflation effect ... and SSDF

Author: Rolf Tueschen

Date: 18:44:41 10/09/02

Go up one level in this thread


On October 09, 2002 at 12:59:24, Robert Hyatt wrote:

>On October 09, 2002 at 05:54:18, Rolf Tueschen wrote:
>
>>On October 09, 2002 at 04:52:40, GuyHaworth wrote:
>>
>>>
>>>Totally agreed:  only the differences between the ELO numbers are relevant.
>>>
>>>I believe there is an inflation effect in the ELO system.  Sadly, investigating
>>>this - by theory or simulation - hasn't got to the top of my 'to do' list yet.
>>>
>>>Anyway, the more games played, the narrower the confidence bands on ELO figures,
>>>but the greater the inflation.
>>>
>>>I believe it was for this reason, or for the sake of credibility, that SSDF
>>>knocked back the absolute numbers a couple of years ago.  Maybe they knocked 100
>>>points off or something?
>>>
>>>Other rating systems, like Thompson's for the PCA, maybe do the rating better
>>>with less inflation, but they haven't been widely adopted.  Perhaps that's a
>>>pity.
>>>
>>>g
>>
>>
>>In Germany I read an interesting ideas from Detlev Pordzik, aka Elvis, that SSDF
>>should lower their values to 250 Elo numbers. So that would reduce the maximum
>>numbers to 2500 and something.
>>
>>Again, what I've written hundreds of times, SSDF could do that but the inherited
>>worst error in SSDF is the testing of machines from DIFFERENT pools! Exitus. The
>>End.
>>
>>Rolf Tueschen
>
>
>You lost me.

Don't say such things without any emergency in sight!


>
>The "pool" the SSDF tests is the pool of computer chess programs, and in that
>regard, I don't
>see where they make any mistakes.  Yes, they play games between current programs
>and old
>programs.

So you can't see any mistakes. Ok. And what is with the control or the constance
of the variables? Did you forget that old progs have no learning at all? The
differences in books? pppppp?!

Bob, next week you'll tel me that the handicapped from the Paralympics could
well "run" against the US 100 meter athletes! They are from the same pool, no?
All human species. <cough>

I thought that we (at least) would know that it's making no sense if we test
several variables free floating at the same time. I mean, what would the results
tell us? Or is it of great interest for you to receive statistical values for
the obvious? That old is weaker than new? I mean, isn't it nonsense to prove
that slow machines are weaker than fast ones?

My God,and you start a debate about inflation? I can't get it into my head
what's going on here. Can't you see the ugly consequences if you give your
blessing for such apparent nonsense?

I know - you want to play games on me, right?








> Yes this tends to inflate their absolute ratings at the top of the
>list.  But the "pool"
>is valid, and the ratings do tend to reflect results between any two players in
>the SSDF pool.


And they are valid for what variables please?



>
>
>IE if you simply pick _any_ two players on the SSDF list, and compare their
>ratings, and then
>play a match between them, their ratings will pretty closely predict the match
>outcome.  And
>that is as it should be.

Ae you sure? So you can't sleep before you get the new results? That CST version
1 on PI is weaker than say Fritz 7 on PIII 2.500? Wow!


>
>The ratings will _not_ predict how the programs will do against programs not in
>the SSDF list,
>nor against humans with FIDE ratings that come from a completely separate pool
>of players...

Of course not, but we are still debating the sense or nonsense of SSDF results.
Please could you answer my questions?

Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.