Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How close/accurate will the rating be in a 10 game match? - Basics!

Author: Rolf Tueschen

Date: 04:35:17 08/28/02

Go up one level in this thread


On August 27, 2002 at 04:25:51, Bertil Eklund wrote:

>On August 26, 2002 at 18:13:56, Rolf Tueschen wrote:
>
>>On August 26, 2002 at 07:54:49, Uri Blass wrote:
>>
>>>On August 26, 2002 at 07:21:58, Rolf Tueschen wrote:
>>>
>>>>On August 26, 2002 at 06:45:03, Uri Blass wrote:
>>>>
>>>>>On August 26, 2002 at 06:18:11, Rolf Tueschen wrote:
>>>>>
>>>>>>On August 25, 2002 at 12:00:06, Peter Fendrich wrote:
>>>>>>
>>>>>>>On August 25, 2002 at 07:59:54, Kurt Utzinger wrote:
>>>>>>>
>>>>>>>>Please have a look at
>>>>>>>>
>>>>>>>>http://ccc.it.ro/search/ccc.php?art_id=217174
>>>>>>>>
>>>>>>>>Regards
>>>>>>>>Kurt
>>>>>>>
>>>>>>
>>>>>>=============================================================================
>>>>>>>These tables are not accurate at all for the lines covering only few games.
>>>>>>=============================================================================
>>>>>>
>>>>>>So, the tables are not correct (for the cases when you only have few, very few
>>>>>>games!), "because" the tables require normal distribution. So far so good. Now
>>>>>>you are argueing, let's take binominal or trinominal, and then we could get rid
>>>>>>of the limitations when you have very few cases (like in SSDF)? I hope I had no
>>>>>>language interferences?
>>>>>
>>>>>ssdf usually play hundreds of games with every program so I do not see the only
>>>>>few games problem.
>>>>
>>>>Excuse me, but I see it. How many hundreds of games they play, that could be
>>>>added up?
>>>
>>>Here is the list of the programs above 2600.
>>>You can see that the porgrams played usually more than 400 games
>>
>>Yes, Uri, I knew it. But! I wrote "that could be added up". Didn't you know that
>>the newest progs also play neandertal (M. Scheidl)? How could they even suggest
>>that we accept such a nonsense. The term validity comes into play. I repeat the
>>question. What do they measure?? Learning function? Or book? This is so trivial
>>for someone who knows what is important in statistics. But this is all a
>>repetition. I think in May 2002 I had already explained all that. Only - you
>>will notice that the SSDF team doesn't answer the exact questions. Not that it
>>matters because I'm talking about undeniable facts. This is not my personal
>>liking or my idea or wishful thinking.
>>
>>
>>>
>>>1 Fritz 7.0 256MB Athlon 1200 MHz         2741   30   -29   574   64%  2636
>>>2 Shredder 6.0 Paderb  256MB Athlon 1200  2727   34   -32   467   65%  2619
>>>3 Chess Tiger 14.0 CB 256MB Athlon 1200   2721   33   -32   487   63%  2627
>>>4 Gambit Tiger 2.0  256MB Athlon 1200     2718   31   -30   523   60%  2645
>>>5 Shredder 6.0  256MB Athlon 1200 MHz     2717   32   -31   505   64%  2618
>>>6 Deep Fritz 256MB Athlon 1200 MHz        2716   33   -32   491   63%  2622
>>>7 Junior 7.0  256MB  Athlon 1200 MHz      2689   29   -29   593   58%  2632
>>>8 Rebel Century 4.0 256MB Athlon 1200 MHz 2684   33   -32   475   63%  2586
>>>9 Hiarcs 8.0  256MB Athlon 1200 MHz       2671   28   -28   624   55%  2638
>>>10 Shredder 5.32  256MB Athlon 1200 MHz    2669   30   -30   538   57%  2622
>>>11 Gandalf 4.32h  256MB Athlon 1200 MHz    2652   34   -33   430   54%  2624
>>>12 Deep Fritz  128MB K6-2 450 MHz          2651   23   -23   959   61%  2571
>>>13 Gandalf 5.0  256MB Athlon 1200 MHz      2642   49   -50   202   46%  2674
>>>14 Gambit Tiger 2.0  128MB K6-2 450 MHz    2641   29   -28   634   66%  2525
>>>15 Gandalf 5.1  256MB Athlon 1200 MHz      2638   26   -26   707   55%  2601
>>>16 Junior 7.0  128MB K6-2 450 MHz          2632   25   -25   815   65%  2524
>>>17 Chess Tiger 14.0 CB 128MB K6-2 450 MHz  2629   28   -27   667   62%  2543
>>>18 Shredder 6.0 UCI 128MB K6-2 450 MHz     2627   55   -54   168   57%  2581
>>>19 Fritz 7.0  128MB K6-2 450 MHz           2625   41   -41   294   53%  2604
>>>20 Fritz 6.0  128MB K6-2 450 MHz           2619   21   -21  1110   61%  2541
>>>21 Crafty 18.12/CB 256MB  Athlon 1200 MHz  2613   30   -29   561   53%  2593
>>>22 Shredder 5.32  128MB K6-2 450 MHz       2605   28   -27   639   58%  2547
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>Without agitation let me make this very clear. Any attempt to show something
>>>>>>reasonable out of only very few cases (like in SSDF) is a myst. The limitations
>>>>>>out of very few cases is absolutely given. There is no way or "trick" to heal
>>>>>>that.
>>>>>>
>>>>>>There is only one single remedy and that is the higher number of cases. And
>>>>>>therefore the actual practice of SSDF is meaningless. And no adding would help
>>>>>>you out of this mess since you are presenting over 30000 games but these games
>>>>>>come from totally incomparable entities. But you could have known this before.
>>>>>>The adding of games in human chess is a completely different process.
>>>>>>
>>>>>>BTW let me repeat the question where you take the validity from in SSDF. What do
>>>>>>you measure? And how did you find control mechanisms?
>>>>>>
>>>>>>Also interesting could be where the similarities in Swedish ELO and human chess
>>>>>>ELO are coming from? Is this decided by definition? When was it done?
>>>>>>
>>>>>>Rolf Tueschen
>>>>>
>>>>>The list is calculated also based on games of humans against old computers.
>>>>
>>>>Tournament games? Do you know details about the very few games then? I think we
>>>>are talking about a myst, excuse me.
>
>Most if not all games were published in PLY.

*************
Uhmm, please read the paragraphe a little bit closer, perhaps then the term
"details" is better visible. Since when calibration is done without further
details and rules?


>
>>>
>>>You can download 14738 of their games in
>>>http://home.interact.se/~w100107/welcome.htm but unfortunately I do not find
>>>comp-human games there.
>>>
>>>
>>>You can find list of human-calibaeration results from 1987-1991 when 24 old
>>>programs played against humans and got rating based on average number of game
>>>that is slightly more than 10 games for program but unfortunately there are
>>>only results and no games when chris carason games do not include the games that
>>>they talk about.
>>
>>That isn't even the most interesting thing. I take it for granted that they
>>played these games. But. You can't take some 20 masters from Sweden and let them
>>play a few skittles. This is not calibration. It's a joke. Do you think that
>>"masters" had something to fear from commercial progs? I don't think so.
>>Had they knowledge of the progs? Training? Interest at all? Incentive? Money?
>>Where are the data from these events. The evidence. It doesn't work like that!
>>You can't take some old master who has still 2450 in the lists and then put him
>>in front of a program. And then you take the results as a proof for the strength
>
>The games are from the Swedish Championships  and other tournaments in
>respective class and the results was included in the tournament. I can asure you
>that most players focused the most on the game against the computer.

Very interesting. Do you have details for these events so that we could
understand _how_ you made your 'includings'? Such a question has _nothing_ to do
with allegations like "you didn't do it the way you are pretending" or such.
It's just the question for details you must surely have in your own documents
about the past events in SSDF.


>
>Isn't it a bit strange that you almost always are wrong about facts?!
>How can you believe that we can believe you in  other "subjective" accusations,
>questions, hocus-pocus or just nonsense.


I hope that you are still a single person. Also I hope that you may realize that
for you as the prominent representative of SSDF it's bad if you are trying to
stigmatize absolutely legitimate questions and reflections, such reactions
always look as if you had something to hide. Also you must realize that in
contradiction to CSS you have here in CCC the same obligation as anybody else to
be polite and leave personal attacks aside. Finally I recommand that you might
understand that you must stay firmly to your 'private' status in SSDF and that
you should not react as if you were in computercchess what the United Nations
are for International politics. You are private and you simply answer questions
where you can answer, and you say when you can't. But please stop to accuse
people like me in a insulting manner just when you are running out of good
answers. Debates and questions are the basis in such a forum and they are not
instruments to stab you in your back. Of course the lack of answers is not
really good for you but to deny the fact of the lack is even worse. And if you
have certain weaknesses in your methods it would be good if "we" _together_
could find remedies instead of emotional reactions. The claim of privacy alone
could be no answer of course!

Rolf Tueschen

>
>>of the machine. Uri - I know that you are participating in Israel's
>>championships and therefore you know that this is not realistical what happens
>>in such skittles. Where nothing is at stake. The computer side takes masters to
>>get Elo numbers! It isn't kosher to say the least. It is surely _not_
>>calibrating. The absence of the game scores is absolutely uninteresting when we
>>are talking about skittles. And here the argument that SSDF is _not_ about
>>science, but it's a private hobby, is _not_ acceptable. You see how you
>>understood calibration! But without calibration and validity you have nothing
>>but results and performances. But not Elo numbers comparable to human chess.
>>
>>BTW all this is _not_ a question of intelligence. Even the most intelligent
>>people could be cheated with statistics. Because if you once rely only on your
>>natural human estimation you must forcably miss the statistical tricks. With
>>stats you can prove that toothbrushs cause the birth of babies. And with SSDF I
>>can prove that FRITZ has 3000 ELO. :)
>>
>>>
>>>see http://home.interact.se/~w100107/level.htm for list of the programs that
>>>played against humans and their rating based on the games.
>>
>>Skittles. Shows. Fun.
>>
>>
>>
>>>>
>>>>
>>>>>
>>>>>The rating of the good programs in the list were too high so they decided 1 or 2
>>>>>years ago to reduce the rating of all programs by 100 elo to make the rating of
>>>>>the programs in the top of the list more realistic against humans.
>>>>
>>>>And now the height is ok? How did you prove it?
>>>
>>>I did not prove it but I think that most people agree that 2841 for Fritz7 on
>>>A1200 is at least 100 elo too high so reducing the number by 100 elo reduce the
>>>difference relative to humans.
>>
>>Uri, Uri! I drives tears in my eyes to see you argue so carefully. But you are
>>already intoxicated. Please subtract 300 Elo numbers and then we can start the
>>debate. Just my opinion. Other numbers are completely unrealistic. Or did you
>>ever see events over a longer period of time, at tournament level, and with real
>>money at stake? And most of all, did you see fair rules? The rules are still
>>coming from the old days when progs were no real opponents. And also this. You
>>know exactly that progs at one time are very good and in others they are weak
>>like beginners.  I don't mean blunders, I mean misunderstanding very basic chess
>>concepts. It's still a mess.
>>
>>Please please do not even think for a second that I have a lack of respect for
>>you. Would I write such articles if I had?
>>
>>Rolf Tueschen
>>
>Bertil



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.