Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How close/accurate will the rating be in a 10 game match? - Basics!

Author: Rolf Tueschen

Date: 15:13:56 08/26/02

Go up one level in this thread


On August 26, 2002 at 07:54:49, Uri Blass wrote:

>On August 26, 2002 at 07:21:58, Rolf Tueschen wrote:
>
>>On August 26, 2002 at 06:45:03, Uri Blass wrote:
>>
>>>On August 26, 2002 at 06:18:11, Rolf Tueschen wrote:
>>>
>>>>On August 25, 2002 at 12:00:06, Peter Fendrich wrote:
>>>>
>>>>>On August 25, 2002 at 07:59:54, Kurt Utzinger wrote:
>>>>>
>>>>>>Please have a look at
>>>>>>
>>>>>>http://ccc.it.ro/search/ccc.php?art_id=217174
>>>>>>
>>>>>>Regards
>>>>>>Kurt
>>>>>
>>>>
>>>>=============================================================================
>>>>>These tables are not accurate at all for the lines covering only few games.
>>>>=============================================================================
>>>>
>>>>So, the tables are not correct (for the cases when you only have few, very few
>>>>games!), "because" the tables require normal distribution. So far so good. Now
>>>>you are argueing, let's take binominal or trinominal, and then we could get rid
>>>>of the limitations when you have very few cases (like in SSDF)? I hope I had no
>>>>language interferences?
>>>
>>>ssdf usually play hundreds of games with every program so I do not see the only
>>>few games problem.
>>
>>Excuse me, but I see it. How many hundreds of games they play, that could be
>>added up?
>
>Here is the list of the programs above 2600.
>You can see that the porgrams played usually more than 400 games

Yes, Uri, I knew it. But! I wrote "that could be added up". Didn't you know that
the newest progs also play neandertal (M. Scheidl)? How could they even suggest
that we accept such a nonsense. The term validity comes into play. I repeat the
question. What do they measure?? Learning function? Or book? This is so trivial
for someone who knows what is important in statistics. But this is all a
repetition. I think in May 2002 I had already explained all that. Only - you
will notice that the SSDF team doesn't answer the exact questions. Not that it
matters because I'm talking about undeniable facts. This is not my personal
liking or my idea or wishful thinking.


>
>1 Fritz 7.0 256MB Athlon 1200 MHz         2741   30   -29   574   64%  2636
>2 Shredder 6.0 Paderb  256MB Athlon 1200  2727   34   -32   467   65%  2619
>3 Chess Tiger 14.0 CB 256MB Athlon 1200   2721   33   -32   487   63%  2627
>4 Gambit Tiger 2.0  256MB Athlon 1200     2718   31   -30   523   60%  2645
>5 Shredder 6.0  256MB Athlon 1200 MHz     2717   32   -31   505   64%  2618
>6 Deep Fritz 256MB Athlon 1200 MHz        2716   33   -32   491   63%  2622
>7 Junior 7.0  256MB  Athlon 1200 MHz      2689   29   -29   593   58%  2632
>8 Rebel Century 4.0 256MB Athlon 1200 MHz 2684   33   -32   475   63%  2586
>9 Hiarcs 8.0  256MB Athlon 1200 MHz       2671   28   -28   624   55%  2638
>10 Shredder 5.32  256MB Athlon 1200 MHz    2669   30   -30   538   57%  2622
>11 Gandalf 4.32h  256MB Athlon 1200 MHz    2652   34   -33   430   54%  2624
>12 Deep Fritz  128MB K6-2 450 MHz          2651   23   -23   959   61%  2571
>13 Gandalf 5.0  256MB Athlon 1200 MHz      2642   49   -50   202   46%  2674
>14 Gambit Tiger 2.0  128MB K6-2 450 MHz    2641   29   -28   634   66%  2525
>15 Gandalf 5.1  256MB Athlon 1200 MHz      2638   26   -26   707   55%  2601
>16 Junior 7.0  128MB K6-2 450 MHz          2632   25   -25   815   65%  2524
>17 Chess Tiger 14.0 CB 128MB K6-2 450 MHz  2629   28   -27   667   62%  2543
>18 Shredder 6.0 UCI 128MB K6-2 450 MHz     2627   55   -54   168   57%  2581
>19 Fritz 7.0  128MB K6-2 450 MHz           2625   41   -41   294   53%  2604
>20 Fritz 6.0  128MB K6-2 450 MHz           2619   21   -21  1110   61%  2541
>21 Crafty 18.12/CB 256MB  Athlon 1200 MHz  2613   30   -29   561   53%  2593
>22 Shredder 5.32  128MB K6-2 450 MHz       2605   28   -27   639   58%  2547
>
>>
>>
>>>
>>>>
>>>>Without agitation let me make this very clear. Any attempt to show something
>>>>reasonable out of only very few cases (like in SSDF) is a myst. The limitations
>>>>out of very few cases is absolutely given. There is no way or "trick" to heal
>>>>that.
>>>>
>>>>There is only one single remedy and that is the higher number of cases. And
>>>>therefore the actual practice of SSDF is meaningless. And no adding would help
>>>>you out of this mess since you are presenting over 30000 games but these games
>>>>come from totally incomparable entities. But you could have known this before.
>>>>The adding of games in human chess is a completely different process.
>>>>
>>>>BTW let me repeat the question where you take the validity from in SSDF. What do
>>>>you measure? And how did you find control mechanisms?
>>>>
>>>>Also interesting could be where the similarities in Swedish ELO and human chess
>>>>ELO are coming from? Is this decided by definition? When was it done?
>>>>
>>>>Rolf Tueschen
>>>
>>>The list is calculated also based on games of humans against old computers.
>>
>>Tournament games? Do you know details about the very few games then? I think we
>>are talking about a myst, excuse me.
>
>You can download 14738 of their games in
>http://home.interact.se/~w100107/welcome.htm but unfortunately I do not find
>comp-human games there.
>
>
>You can find list of human-calibaeration results from 1987-1991 when 24 old
>programs played against humans and got rating based on average number of game
>that is slightly more than 10 games for program but unfortunately there are
>only results and no games when chris carason games do not include the games that
>they talk about.

That isn't even the most interesting thing. I take it for granted that they
played these games. But. You can't take some 20 masters from Sweden and let them
play a few skittles. This is not calibration. It's a joke. Do you think that
"masters" had something to fear from commercial progs? I don't think so.
Had they knowledge of the progs? Training? Interest at all? Incentive? Money?
Where are the data from these events. The evidence. It doesn't work like that!
You can't take some old master who has still 2450 in the lists and then put him
in front of a program. And then you take the results as a proof for the strength
of the machine. Uri - I know that you are participating in Israel's
championships and therefore you know that this is not realistical what happens
in such skittles. Where nothing is at stake. The computer side takes masters to
get Elo numbers! It isn't kosher to say the least. It is surely _not_
calibrating. The absence of the game scores is absolutely uninteresting when we
are talking about skittles. And here the argument that SSDF is _not_ about
science, but it's a private hobby, is _not_ acceptable. You see how you
understood calibration! But without calibration and validity you have nothing
but results and performances. But not Elo numbers comparable to human chess.

BTW all this is _not_ a question of intelligence. Even the most intelligent
people could be cheated with statistics. Because if you once rely only on your
natural human estimation you must forcably miss the statistical tricks. With
stats you can prove that toothbrushs cause the birth of babies. And with SSDF I
can prove that FRITZ has 3000 ELO. :)

>
>see http://home.interact.se/~w100107/level.htm for list of the programs that
>played against humans and their rating based on the games.

Skittles. Shows. Fun.



>>
>>
>>>
>>>The rating of the good programs in the list were too high so they decided 1 or 2
>>>years ago to reduce the rating of all programs by 100 elo to make the rating of
>>>the programs in the top of the list more realistic against humans.
>>
>>And now the height is ok? How did you prove it?
>
>I did not prove it but I think that most people agree that 2841 for Fritz7 on
>A1200 is at least 100 elo too high so reducing the number by 100 elo reduce the
>difference relative to humans.

Uri, Uri! I drives tears in my eyes to see you argue so carefully. But you are
already intoxicated. Please subtract 300 Elo numbers and then we can start the
debate. Just my opinion. Other numbers are completely unrealistic. Or did you
ever see events over a longer period of time, at tournament level, and with real
money at stake? And most of all, did you see fair rules? The rules are still
coming from the old days when progs were no real opponents. And also this. You
know exactly that progs at one time are very good and in others they are weak
like beginners.  I don't mean blunders, I mean misunderstanding very basic chess
concepts. It's still a mess.

Please please do not even think for a second that I have a lack of respect for
you. Would I write such articles if I had?

Rolf Tueschen


>
>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.