Author: Sune Fischer
Date: 11:39:22 06/03/02
Go up one level in this thread
On June 03, 2002 at 13:22:09, Chris Carson wrote: >On June 03, 2002 at 12:30:53, Sune Fischer wrote: > >> >>yes, if you could do a hypothesis testing, of course, but how do you do that? >>I hypothesise there has been a slight inflation in rating, but also a slight >>increase in average strength of top playes. This means we should subtract a bit >>from the current ratings of players, but their mean should still be higher. >>How do you design a test to confirm this hypothesis? > >compare means between 1972 GM ratings and 2002 GM ratings. Since you use >slight, I will assume you do not care about significance, although you can >determine this with a t-test. The t-test is one stat test to help confirm your >hypothesis (there are others you would actuall do for a more detailed analysis). I don't follow how you want to apply the t-test here. It will show you how one rating system correlates to another, but not how the underlying strengths correlate, which is what is interesting. >If you think there has been ratings inflation, then by definition you are >comparing the ratings of 2002 with 1972 (or whatever date you choose). Small >changes up or down over time may not be significant. That is my belief, but I have no way of proving that since comparing two rating lists that have very bad correlation doesn't make sense to me. Maybe there are statistical tricks that will patch things up, ie. reestablish the correlation. But you need something more than just the elo-scales to do that. >> >>You can't very well ask the players from the past to solve a given testset of >>positions... >>You need some *fixpoint*, some universial scale to match up against, so far we >>have been unable to design such an scale. > >Here is where we disagree, the FIDE ELO scale can be used. Yes the membership >will change, but the rate of change is slow and provides a good measure. My >guess is you disagree. Again let me encourage you to go learn how to study >humans over time (longitudinal studies). Yes this is where we disagree. You assume it's valid, that there is a good correlation and therefore you can do tests. But this is an assumtion that puts you very close to what you want to prove, the proof is in your assumtion, not in the statistics AFAIK. If I give you two random distributions, what do you expect the t-test will show you? You have Elo_1970(strength)=F(T_1970(strength)) and Elo_2002(strength)=G(T_2002(strength)), now F(T(..)) and G(T(..)) are known distributions, namely the ratinglists. But we want to find how the strength evolved in time, how do we do that? If you treat F,G and T as unknowns (as I do), then you will get nowhere in you analysis, you need to make assumtions or approksimations, that is unless I'm overlooking something ;) >There are other subjective ways to measure strength. I like the more objective >ELO comparison, if you do not, then don't use it. The elo scale is fine, but it only works in the here and now. >>Please tell me how to compare strengths when the elo scale is useless? > >I disagree that it is useless. Why would you want to throw it out? If you have a method that is better, then what do we need it for? The elo is already the best we have, by definition. I think it would be possible to make a better scale than elo, starting now, but I'm not sure we could extrapolate it backwards in time. -S.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.