Author: Robert Hyatt
Date: 17:21:20 10/10/02
Go up one level in this thread
On October 10, 2002 at 19:05:41, Rolf Tueschen wrote: >On October 10, 2002 at 10:01:50, Robert Hyatt wrote: > >>On October 09, 2002 at 21:44:41, Rolf Tueschen wrote: >> >>>On October 09, 2002 at 12:59:24, Robert Hyatt wrote: >>> >>>>On October 09, 2002 at 05:54:18, Rolf Tueschen wrote: >>>> >>>>>On October 09, 2002 at 04:52:40, GuyHaworth wrote: >>>>> >>>>>> >>>>>>Totally agreed: only the differences between the ELO numbers are relevant. >>>>>> >>>>>>I believe there is an inflation effect in the ELO system. Sadly, investigating >>>>>>this - by theory or simulation - hasn't got to the top of my 'to do' list yet. >>>>>> >>>>>>Anyway, the more games played, the narrower the confidence bands on ELO figures, >>>>>>but the greater the inflation. >>>>>> >>>>>>I believe it was for this reason, or for the sake of credibility, that SSDF >>>>>>knocked back the absolute numbers a couple of years ago. Maybe they knocked 100 >>>>>>points off or something? >>>>>> >>>>>>Other rating systems, like Thompson's for the PCA, maybe do the rating better >>>>>>with less inflation, but they haven't been widely adopted. Perhaps that's a >>>>>>pity. >>>>>> >>>>>>g >>>>> >>>>> >>>>>In Germany I read an interesting ideas from Detlev Pordzik, aka Elvis, that SSDF >>>>>should lower their values to 250 Elo numbers. So that would reduce the maximum >>>>>numbers to 2500 and something. >>>>> >>>>>Again, what I've written hundreds of times, SSDF could do that but the inherited >>>>>worst error in SSDF is the testing of machines from DIFFERENT pools! Exitus. The >>>>>End. >>>>> >>>>>Rolf Tueschen >>>> >>>> >>>>You lost me. >>> >>>Don't say such things without any emergency in sight! >>> >>> >>>> >>>>The "pool" the SSDF tests is the pool of computer chess programs, and in that >>>>regard, I don't >>>>see where they make any mistakes. Yes, they play games between current programs >>>>and old >>>>programs. >>> >>>So you can't see any mistakes. Ok. And what is with the control or the constance >>>of the variables? Did you forget that old progs have no learning at all? The >>>differences in books? pppppp?! >>] >> >>But that is _part_ of the system. If program A learns and program B does not, >>then the >>expected win/loss ratio should favor program A. > >Bob, I can't believe it that you are arguing this way. Again, what do they >measure? Elo measured 'strength'! That's the problem here. Elo does _not_ measure strength. It only predicts the outcome of a match between any two players participating in the rating pool. One could be lucky and choose a move by random process and win game after game. And Elo would predict he would continue to win at that same frequency against the same opponent, even though he is merely lucky and not good... >The rest must be held constant. Yes or no? No... Because the Elo system is just a statistical predictor mechanism that uses past performance to predict future results. The rating "numbers" mean nothing. Only when you compare two numbers do you get useful information, not just by looking at one number and going "oooohhhhhh!" >Elo works with the development of strength in time. Indirectly. It uses results to predict future results. If results depend on strength, then perhaps so. But you have to define strength carefully to make it fit. If one player has some ability that the other doesn't have, then is he stronger, or different? And to Elo it doesn't matter. If that ability helps him win more games, his rating will climb even if he is no stronger. >One important aspect of the >listing. Did you ever see that e.g. data is taken from Blind chess Exhibitions? >Can't you understand what I'm talking about? Elo is based on tournament chess of >the same species. That is flawed thinking. The USCF, and FIDE and so forth do use the "same species" because humans don't change very quickly. But keep the Elo system in place for several hundred centuries... How would an ancient cave-man compare to today's chess players when his IQ would be 1/4 of the typical chess player today? Computers have a _much_ more rapid scale of evolution, so that on the SSDF we do see cavemen, great players, and even Noonian Singh Khan (if you know that name). :) > Red hairs, spectacles, wooden legs are regarded as unimportant >variables. But in computer chess, you can take learning or not learning all in >the same data pool? -- I remind you of the Blind chess data! You get it now? >Excuse me but I find it so strange that you defend the SSDF nonsense. The problem is that you are trying to compare humans to computers again. And it is not an easy comparison. 30 years ago chess programs were 1500. Today they are 2500. Did humans improve that much overall in 30 years? Not even close. So trying to hold computers to the same standards makes very little sense, IMHO... > > > >> >> >> >>> >>>Bob, next week you'll tel me that the handicapped from the Paralympics could >>>well "run" against the US 100 meter athletes! They are from the same pool, no? >>>All human species. <cough> >> >>Yes they could compete. They would lose, and they would be ranked _below_ the >>non-handicapped folks of course. As they should be if you want to compare them >>to each other... > >You make a huge mistake. Because you can only compare what is "comparable"! You >data must be clean. SSDF is not clean. It is only comparing 20 years of players (computers). FIDE/USCF has ratings _much_ older than that. But again, humans and computer chess players are evolving at a different pace... > > >> >> >> >> >> >>> >>>I thought that we (at least) would know that it's making no sense if we test >>>several variables free floating at the same time. I mean, what would the results >>>tell us? Or is it of great interest for you to receive statistical values for >>>the obvious? That old is weaker than new? I mean, isn't it nonsense to prove >>>that slow machines are weaker than fast ones? >> >> >>The only testing flaw in the SSDF that I see has two prongs: (a) too manyh >>games >>between two programs; (b) not enough games against the _entire_ pool of >>players. > >See above... > > >> >> >> >> >> >>> >>>My God,and you start a debate about inflation? I can't get it into my head >>>what's going on here. Can't you see the ugly consequences if you give your >>>blessing for such apparent nonsense? >> >>The statistics _demand_ that old programs play new to establish ratings. >>Otherwiseyou have two separate pools of players and the ratings don't mean anything across the pools. > >Bob, this is getting worse in minutes. Because you want to compare thing you >can't compare you argue that you must play programs of totally different eras. >Which can't be compared - in the variable strength. Isn't that easy to >understand? So I shouldn't play a rated game against Korchnoi? Or against Bill Lombardy? I should only play players in my "age group"? > > > >> >> >> >> >>> >>>I know - you want to play games on me, right? >> >> >> >>not at all... > >I can't believe it. You leave me inconsolable. That can't be true! > > > >> >> >> >>> >>> >>> >>> >>> >>> >>> >>> >>>> Yes this tends to inflate their absolute ratings at the top of the >>>>list. But the "pool" >>>>is valid, and the ratings do tend to reflect results between any two players in >>>>the SSDF pool. >>> >>> >>>And they are valid for what variables please? >> >>To predict the match outcome between any two players in the pool, nothing more. > >So SSDF, the list you once called indispensable, is stating things like at >midday it's 12 o'clock and we eat dinner... Afterwards our weight increased... >Well done! > As I said, computers evolve at a different speed than humans. That is what is causing the consternation... > >> >> >> >> >> >>> >>> >>> >>>> >>>> >>>>IE if you simply pick _any_ two players on the SSDF list, and compare their >>>>ratings, and then >>>>play a match between them, their ratings will pretty closely predict the match >>>>outcome. And >>>>that is as it should be. >>> >>>Ae you sure? So you can't sleep before you get the new results? That CST version >>>1 on PI is weaker than say Fritz 7 on PIII 2.500? Wow! >> >>That is obvious. But the question is, "how much weaker"? And "How much weaker >>are the programs we didn't test it against?" That is what the "rating" is all >>about. > > >How do you define weakness? Isn't weakness already implied if you don't have >learning tools? And how do you measure the weakness if the machines without >learning get the weakest computers? Could you explain what 'weakness' means in >SSDF? > >Rolf Tueschen > Weakness can be defined in many ways. I wouldn't want to play against Khan. Nor fight with him. It would be a problem. But it isn't because I don't have the tools I need to do today's tasks... it is just that he has new and improved tools that are simply better than the best I have. > >> >> >> >>> >>> >>>> >>>>The ratings will _not_ predict how the programs will do against programs not in >>>>the SSDF list, >>>>nor against humans with FIDE ratings that come from a completely separate pool >>>>of players... >>> >>>Of course not, but we are still debating the sense or nonsense of SSDF results. >>>Please could you answer my questions? >>> >>>Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.