Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How close/accurate will the rating be in a 10 game match? - Basics!

Author: Uri Blass

Date: 04:54:49 08/26/02

Go up one level in this thread


On August 26, 2002 at 07:21:58, Rolf Tueschen wrote:

>On August 26, 2002 at 06:45:03, Uri Blass wrote:
>
>>On August 26, 2002 at 06:18:11, Rolf Tueschen wrote:
>>
>>>On August 25, 2002 at 12:00:06, Peter Fendrich wrote:
>>>
>>>>On August 25, 2002 at 07:59:54, Kurt Utzinger wrote:
>>>>
>>>>>Please have a look at
>>>>>
>>>>>http://ccc.it.ro/search/ccc.php?art_id=217174
>>>>>
>>>>>Regards
>>>>>Kurt
>>>>
>>>
>>>=============================================================================
>>>>These tables are not accurate at all for the lines covering only few games.
>>>=============================================================================
>>>
>>>So, the tables are not correct (for the cases when you only have few, very few
>>>games!), "because" the tables require normal distribution. So far so good. Now
>>>you are argueing, let's take binominal or trinominal, and then we could get rid
>>>of the limitations when you have very few cases (like in SSDF)? I hope I had no
>>>language interferences?
>>
>>ssdf usually play hundreds of games with every program so I do not see the only
>>few games problem.
>
>Excuse me, but I see it. How many hundreds of games they play, that could be
>added up?

Here is the list of the programs above 2600.
You can see that the porgrams played usually more than 400 games

1 Fritz 7.0 256MB Athlon 1200 MHz         2741   30   -29   574   64%  2636
2 Shredder 6.0 Paderb  256MB Athlon 1200  2727   34   -32   467   65%  2619
3 Chess Tiger 14.0 CB 256MB Athlon 1200   2721   33   -32   487   63%  2627
4 Gambit Tiger 2.0  256MB Athlon 1200     2718   31   -30   523   60%  2645
5 Shredder 6.0  256MB Athlon 1200 MHz     2717   32   -31   505   64%  2618
6 Deep Fritz 256MB Athlon 1200 MHz        2716   33   -32   491   63%  2622
7 Junior 7.0  256MB  Athlon 1200 MHz      2689   29   -29   593   58%  2632
8 Rebel Century 4.0 256MB Athlon 1200 MHz 2684   33   -32   475   63%  2586
9 Hiarcs 8.0  256MB Athlon 1200 MHz       2671   28   -28   624   55%  2638
10 Shredder 5.32  256MB Athlon 1200 MHz    2669   30   -30   538   57%  2622
11 Gandalf 4.32h  256MB Athlon 1200 MHz    2652   34   -33   430   54%  2624
12 Deep Fritz  128MB K6-2 450 MHz          2651   23   -23   959   61%  2571
13 Gandalf 5.0  256MB Athlon 1200 MHz      2642   49   -50   202   46%  2674
14 Gambit Tiger 2.0  128MB K6-2 450 MHz    2641   29   -28   634   66%  2525
15 Gandalf 5.1  256MB Athlon 1200 MHz      2638   26   -26   707   55%  2601
16 Junior 7.0  128MB K6-2 450 MHz          2632   25   -25   815   65%  2524
17 Chess Tiger 14.0 CB 128MB K6-2 450 MHz  2629   28   -27   667   62%  2543
18 Shredder 6.0 UCI 128MB K6-2 450 MHz     2627   55   -54   168   57%  2581
19 Fritz 7.0  128MB K6-2 450 MHz           2625   41   -41   294   53%  2604
20 Fritz 6.0  128MB K6-2 450 MHz           2619   21   -21  1110   61%  2541
21 Crafty 18.12/CB 256MB  Athlon 1200 MHz  2613   30   -29   561   53%  2593
22 Shredder 5.32  128MB K6-2 450 MHz       2605   28   -27   639   58%  2547

>
>
>>
>>>
>>>Without agitation let me make this very clear. Any attempt to show something
>>>reasonable out of only very few cases (like in SSDF) is a myst. The limitations
>>>out of very few cases is absolutely given. There is no way or "trick" to heal
>>>that.
>>>
>>>There is only one single remedy and that is the higher number of cases. And
>>>therefore the actual practice of SSDF is meaningless. And no adding would help
>>>you out of this mess since you are presenting over 30000 games but these games
>>>come from totally incomparable entities. But you could have known this before.
>>>The adding of games in human chess is a completely different process.
>>>
>>>BTW let me repeat the question where you take the validity from in SSDF. What do
>>>you measure? And how did you find control mechanisms?
>>>
>>>Also interesting could be where the similarities in Swedish ELO and human chess
>>>ELO are coming from? Is this decided by definition? When was it done?
>>>
>>>Rolf Tueschen
>>
>>The list is calculated also based on games of humans against old computers.
>
>Tournament games? Do you know details about the very few games then? I think we
>are talking about a myst, excuse me.

You can download 14738 of their games in
http://home.interact.se/~w100107/welcome.htm but unfortunately I do not find
comp-human games there.


You can find list of human-calibaeration results from 1987-1991 when 24 old
programs played against humans and got rating based on average number of game
that is slightly more than 10 games for program but unfortunately there are
only results and no games when chris carason games do not include the games that
they talk about.

see http://home.interact.se/~w100107/level.htm for list of the programs that
played against humans and their rating based on the games.
>
>
>>
>>The rating of the good programs in the list were too high so they decided 1 or 2
>>years ago to reduce the rating of all programs by 100 elo to make the rating of
>>the programs in the top of the list more realistic against humans.
>
>And now the height is ok? How did you prove it?

I did not prove it but I think that most people agree that 2841 for Fritz7 on
A1200 is at least 100 elo too high so reducing the number by 100 elo reduce the
difference relative to humans.


Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.