Author: Chris Carson
Date: 11:52:15 06/03/02
Go up one level in this thread
On June 03, 2002 at 14:39:22, Sune Fischer wrote: >On June 03, 2002 at 13:22:09, Chris Carson wrote: > >>On June 03, 2002 at 12:30:53, Sune Fischer wrote: >> >>> >>>yes, if you could do a hypothesis testing, of course, but how do you do that? >>>I hypothesise there has been a slight inflation in rating, but also a slight >>>increase in average strength of top playes. This means we should subtract a bit >>>from the current ratings of players, but their mean should still be higher. >>>How do you design a test to confirm this hypothesis? >> >>compare means between 1972 GM ratings and 2002 GM ratings. Since you use >>slight, I will assume you do not care about significance, although you can >>determine this with a t-test. The t-test is one stat test to help confirm your >>hypothesis (there are others you would actuall do for a more detailed analysis). > >I don't follow how you want to apply the t-test here. >It will show you how one rating system correlates to another, but not how the >underlying strengths correlate, which is what is interesting. > >>If you think there has been ratings inflation, then by definition you are >>comparing the ratings of 2002 with 1972 (or whatever date you choose). Small >>changes up or down over time may not be significant. > >That is my belief, but I have no way of proving that since comparing two rating >lists that have very bad correlation doesn't make sense to me. >Maybe there are statistical tricks that will patch things up, ie. reestablish >the correlation. But you need something more than just the elo-scales to do >that. > >>> >>>You can't very well ask the players from the past to solve a given testset of >>>positions... >>>You need some *fixpoint*, some universial scale to match up against, so far we >>>have been unable to design such an scale. >> >>Here is where we disagree, the FIDE ELO scale can be used. Yes the membership >>will change, but the rate of change is slow and provides a good measure. My >>guess is you disagree. Again let me encourage you to go learn how to study >>humans over time (longitudinal studies). > >Yes this is where we disagree. >You assume it's valid, that there is a good correlation and therefore you can do >tests. But this is an assumtion that puts you very close to what you want to >prove, the proof is in your assumtion, not in the statistics AFAIK. > >If I give you two random distributions, what do you expect the t-test will show >you? >You have >Elo_1970(strength)=F(T_1970(strength)) and >Elo_2002(strength)=G(T_2002(strength)), now F(T(..)) and G(T(..)) are known >distributions, namely the ratinglists. >But we want to find how the strength evolved in time, how do we do that? > >If you treat F,G and T as unknowns (as I do), then you will get nowhere in you >analysis, you need to make assumtions or approksimations, that is unless I'm >overlooking something ;) > >>There are other subjective ways to measure strength. I like the more objective >>ELO comparison, if you do not, then don't use it. > >The elo scale is fine, but it only works in the here and now. > >>>Please tell me how to compare strengths when the elo scale is useless? >> >>I disagree that it is useless. Why would you want to throw it out? > >If you have a method that is better, then what do we need it for? >The elo is already the best we have, by definition. > >I think it would be possible to make a better scale than elo, starting now, but >I'm not sure we could extrapolate it backwards in time. > >-S. Well, we just disagree.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.