Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How close/accurate will the rating be in a 10 game match? - Basics!

Author: Bertil Eklund

Date: 01:25:51 08/27/02

Go up one level in this thread


On August 26, 2002 at 18:13:56, Rolf Tueschen wrote:

>On August 26, 2002 at 07:54:49, Uri Blass wrote:
>
>>On August 26, 2002 at 07:21:58, Rolf Tueschen wrote:
>>
>>>On August 26, 2002 at 06:45:03, Uri Blass wrote:
>>>
>>>>On August 26, 2002 at 06:18:11, Rolf Tueschen wrote:
>>>>
>>>>>On August 25, 2002 at 12:00:06, Peter Fendrich wrote:
>>>>>
>>>>>>On August 25, 2002 at 07:59:54, Kurt Utzinger wrote:
>>>>>>
>>>>>>>Please have a look at
>>>>>>>
>>>>>>>http://ccc.it.ro/search/ccc.php?art_id=217174
>>>>>>>
>>>>>>>Regards
>>>>>>>Kurt
>>>>>>
>>>>>
>>>>>=============================================================================
>>>>>>These tables are not accurate at all for the lines covering only few games.
>>>>>=============================================================================
>>>>>
>>>>>So, the tables are not correct (for the cases when you only have few, very few
>>>>>games!), "because" the tables require normal distribution. So far so good. Now
>>>>>you are argueing, let's take binominal or trinominal, and then we could get rid
>>>>>of the limitations when you have very few cases (like in SSDF)? I hope I had no
>>>>>language interferences?
>>>>
>>>>ssdf usually play hundreds of games with every program so I do not see the only
>>>>few games problem.
>>>
>>>Excuse me, but I see it. How many hundreds of games they play, that could be
>>>added up?
>>
>>Here is the list of the programs above 2600.
>>You can see that the porgrams played usually more than 400 games
>
>Yes, Uri, I knew it. But! I wrote "that could be added up". Didn't you know that
>the newest progs also play neandertal (M. Scheidl)? How could they even suggest
>that we accept such a nonsense. The term validity comes into play. I repeat the
>question. What do they measure?? Learning function? Or book? This is so trivial
>for someone who knows what is important in statistics. But this is all a
>repetition. I think in May 2002 I had already explained all that. Only - you
>will notice that the SSDF team doesn't answer the exact questions. Not that it
>matters because I'm talking about undeniable facts. This is not my personal
>liking or my idea or wishful thinking.
>
>
>>
>>1 Fritz 7.0 256MB Athlon 1200 MHz         2741   30   -29   574   64%  2636
>>2 Shredder 6.0 Paderb  256MB Athlon 1200  2727   34   -32   467   65%  2619
>>3 Chess Tiger 14.0 CB 256MB Athlon 1200   2721   33   -32   487   63%  2627
>>4 Gambit Tiger 2.0  256MB Athlon 1200     2718   31   -30   523   60%  2645
>>5 Shredder 6.0  256MB Athlon 1200 MHz     2717   32   -31   505   64%  2618
>>6 Deep Fritz 256MB Athlon 1200 MHz        2716   33   -32   491   63%  2622
>>7 Junior 7.0  256MB  Athlon 1200 MHz      2689   29   -29   593   58%  2632
>>8 Rebel Century 4.0 256MB Athlon 1200 MHz 2684   33   -32   475   63%  2586
>>9 Hiarcs 8.0  256MB Athlon 1200 MHz       2671   28   -28   624   55%  2638
>>10 Shredder 5.32  256MB Athlon 1200 MHz    2669   30   -30   538   57%  2622
>>11 Gandalf 4.32h  256MB Athlon 1200 MHz    2652   34   -33   430   54%  2624
>>12 Deep Fritz  128MB K6-2 450 MHz          2651   23   -23   959   61%  2571
>>13 Gandalf 5.0  256MB Athlon 1200 MHz      2642   49   -50   202   46%  2674
>>14 Gambit Tiger 2.0  128MB K6-2 450 MHz    2641   29   -28   634   66%  2525
>>15 Gandalf 5.1  256MB Athlon 1200 MHz      2638   26   -26   707   55%  2601
>>16 Junior 7.0  128MB K6-2 450 MHz          2632   25   -25   815   65%  2524
>>17 Chess Tiger 14.0 CB 128MB K6-2 450 MHz  2629   28   -27   667   62%  2543
>>18 Shredder 6.0 UCI 128MB K6-2 450 MHz     2627   55   -54   168   57%  2581
>>19 Fritz 7.0  128MB K6-2 450 MHz           2625   41   -41   294   53%  2604
>>20 Fritz 6.0  128MB K6-2 450 MHz           2619   21   -21  1110   61%  2541
>>21 Crafty 18.12/CB 256MB  Athlon 1200 MHz  2613   30   -29   561   53%  2593
>>22 Shredder 5.32  128MB K6-2 450 MHz       2605   28   -27   639   58%  2547
>>
>>>
>>>
>>>>
>>>>>
>>>>>Without agitation let me make this very clear. Any attempt to show something
>>>>>reasonable out of only very few cases (like in SSDF) is a myst. The limitations
>>>>>out of very few cases is absolutely given. There is no way or "trick" to heal
>>>>>that.
>>>>>
>>>>>There is only one single remedy and that is the higher number of cases. And
>>>>>therefore the actual practice of SSDF is meaningless. And no adding would help
>>>>>you out of this mess since you are presenting over 30000 games but these games
>>>>>come from totally incomparable entities. But you could have known this before.
>>>>>The adding of games in human chess is a completely different process.
>>>>>
>>>>>BTW let me repeat the question where you take the validity from in SSDF. What do
>>>>>you measure? And how did you find control mechanisms?
>>>>>
>>>>>Also interesting could be where the similarities in Swedish ELO and human chess
>>>>>ELO are coming from? Is this decided by definition? When was it done?
>>>>>
>>>>>Rolf Tueschen
>>>>
>>>>The list is calculated also based on games of humans against old computers.
>>>
>>>Tournament games? Do you know details about the very few games then? I think we
>>>are talking about a myst, excuse me.

Most if not all games were published in PLY.

>>
>>You can download 14738 of their games in
>>http://home.interact.se/~w100107/welcome.htm but unfortunately I do not find
>>comp-human games there.
>>
>>
>>You can find list of human-calibaeration results from 1987-1991 when 24 old
>>programs played against humans and got rating based on average number of game
>>that is slightly more than 10 games for program but unfortunately there are
>>only results and no games when chris carason games do not include the games that
>>they talk about.
>
>That isn't even the most interesting thing. I take it for granted that they
>played these games. But. You can't take some 20 masters from Sweden and let them
>play a few skittles. This is not calibration. It's a joke. Do you think that
>"masters" had something to fear from commercial progs? I don't think so.
>Had they knowledge of the progs? Training? Interest at all? Incentive? Money?
>Where are the data from these events. The evidence. It doesn't work like that!
>You can't take some old master who has still 2450 in the lists and then put him
>in front of a program. And then you take the results as a proof for the strength

The games are from the Swedish Championships  and other tournaments in
respective class and the results was included in the tournament. I can asure you
that most players focused the most on the game against the computer.

Isn't it a bit strange that you almost always are wrong about facts?!
How can you believe that we can believe you in  other "subjective" accusations,
questions, hocus-pocus or just nonsense.

>of the machine. Uri - I know that you are participating in Israel's
>championships and therefore you know that this is not realistical what happens
>in such skittles. Where nothing is at stake. The computer side takes masters to
>get Elo numbers! It isn't kosher to say the least. It is surely _not_
>calibrating. The absence of the game scores is absolutely uninteresting when we
>are talking about skittles. And here the argument that SSDF is _not_ about
>science, but it's a private hobby, is _not_ acceptable. You see how you
>understood calibration! But without calibration and validity you have nothing
>but results and performances. But not Elo numbers comparable to human chess.
>
>BTW all this is _not_ a question of intelligence. Even the most intelligent
>people could be cheated with statistics. Because if you once rely only on your
>natural human estimation you must forcably miss the statistical tricks. With
>stats you can prove that toothbrushs cause the birth of babies. And with SSDF I
>can prove that FRITZ has 3000 ELO. :)
>
>>
>>see http://home.interact.se/~w100107/level.htm for list of the programs that
>>played against humans and their rating based on the games.
>
>Skittles. Shows. Fun.
>
>
>
>>>
>>>
>>>>
>>>>The rating of the good programs in the list were too high so they decided 1 or 2
>>>>years ago to reduce the rating of all programs by 100 elo to make the rating of
>>>>the programs in the top of the list more realistic against humans.
>>>
>>>And now the height is ok? How did you prove it?
>>
>>I did not prove it but I think that most people agree that 2841 for Fritz7 on
>>A1200 is at least 100 elo too high so reducing the number by 100 elo reduce the
>>difference relative to humans.
>
>Uri, Uri! I drives tears in my eyes to see you argue so carefully. But you are
>already intoxicated. Please subtract 300 Elo numbers and then we can start the
>debate. Just my opinion. Other numbers are completely unrealistic. Or did you
>ever see events over a longer period of time, at tournament level, and with real
>money at stake? And most of all, did you see fair rules? The rules are still
>coming from the old days when progs were no real opponents. And also this. You
>know exactly that progs at one time are very good and in others they are weak
>like beginners.  I don't mean blunders, I mean misunderstanding very basic chess
>concepts. It's still a mess.
>
>Please please do not even think for a second that I have a lack of respect for
>you. Would I write such articles if I had?
>
>Rolf Tueschen
>
Bertil



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.