Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is the SSDF taking a break from testing?

Author: Dann Corbit

Date: 17:07:46 07/15/05

Go up one level in this thread


On July 15, 2005 at 19:27:56, Sune Fischer wrote:

>On July 15, 2005 at 18:29:28, Dann Corbit wrote:
>>
>>The time difference is really about 4:1
>
>Yes but they use faster machines.
>
>Anyway the point is simply that what we call quality today is what we will call
>crap in 3 years. Just like what we called quality 3 years ago is what we call
>crap today.
>
>Am I the only one who can see how rediculous that is?

In 1905 you would have been happy with a car that had a top speed of 25 MPH.
Today you won't.  To me, it is like saying, "There is no need to make that any
more beautiful because it is already beautiful enough."

No matter how beautiful it is, it can always become more beautiful.  And I will
like it better when it becomes such.  Eventually, I suppose, I will no longer be
able to appreciate the moves.  Then it will be time to look at something else, I
suppose.

>>>How many people actually go over these tons and tons of automated games anyway..
>>
>>Not many.  I also find the games between the best programs very useful for book
>>building.  I would not trust the CEGT games for that purpose.
>
>I would prefer to use GM games, still.
>Needs another few years for "the quality" to be there ;)

Mostly, the opening books for the computer programs already came from there.
And I think that at 40/2 with ponder on on 1 GHz, the computer will make less
mistakes than any GM.  Yes, they will miss some brilliant positional moves.  But
on average, they will make excellent choices.  Actually, I think a mix of SSDF +
Correspondence + OTB makes the best books (as far as auto-generated books).  The
"real" best books will be made by experts.

>>>In order to construct a usable and interesting rating list priority number 1 is
>>>to have enough games for a reliable rating, otherwise it _is_ going to be
>>>statistical garbage.
>>
>>Controlling the environment of the test so that it is reproducible is probably
>>in the same range of importance as the large number of games.
>
>I guess saving the logs should be enough, who is going to reproduce a long
>tournament anyway? :)
>
>But actually I agree with you, which is why I _don't_ like that the SSDF use
>books and learning. Fixed start positions give full control every single time.
>
>>The AEGT and CEGT games do not seem to be held at a consistent time control.
>
>That's not so good obviously, but probably the price you have to pay when making
>a rating list in a big distributed manner.

It may be that the different time controls are really what is wanted, if the
machines have been calibrated to some certain number of nodes or something.  At
any rate, I think the CEGT stuff is very good data.

>>The AEGT and CEGT contests assume the NUNN positions as openings, and so they do
>>not exercise the opening book of the program being tested.  That is fine to
>>measure engine strength, but it will not tell you about book+program and it will
>>not help you to prepare for that opponent (if it is a goal).
>
>People tend to make their own books and in tournaments the authors always use
>special (handcrafted) books, so in general it's a good idea to keep engine and
>book seperated when measuring.

Depends on what you want to measure.  If it is engine strength, then I agree
with you.  If it is system strength, then I think the results will be wrong.

>>The older programs have more games against them and therefore are more accurate
>>as measuring tools.  But a lot of people get hot under the collar about running
>>games on 450 MHz computers when they do that.
>
>It doesn't matter if you have stable engines. What you do is you run elostat on
>the whole database everytime, so ratings will automaticly rescale.
>
>At least it seems foolish to play with an old engine if the a newer version has
>been released. Remember you will still be playing with the old engine indirectly
>when you play against others engines that has played against it..

If I have ten games with a new engine of 3000 Elo and I have 10,000 games with
an old engine of 2300 Elo, the old engine will give me much better data by
playing against it than against the new one.  The new engine will have huge
error bars in the confidence interval, and these must necessarily translate to
the engine for which the calibrated engine is used as a reference.  I exaggerate
the numbers to make the meaning obvious, but the message is plain enough -- you
get better numbers from the measuring sticks with the finest graduations on
them.

>>There is some inconsistent naming of the program names in CEGT and AEGT,
>
>Such as?

AnMon 5.50                : 2556   81 114    38    47.4 %   2575   36.8 %
AnMon5.50                 : 2543   17  18  1088    41.2 %   2605   31.6 %

GLC 3.01.2.2              : 2490   41  30   265    52.5 %   2473   35.5 %
GLChess 3.01.2.2          : 2561  128 219    13    46.2 %   2588   46.2 %
GLChess 3.0122            : 2534   86 114    38    47.4 %   2552   31.6 %
Green Light Chess 3.01.2.2: 2561   24  32   423    46.0 %   2589   37.6 %
Green Light Chess 301.2.2 : 2547  126 126    22    50.0 %   2547   36.4 %
GreenLightChess 30122     : 2547  210 210    11    50.0 %   2547   27.3 %

Knight Dreamer 3.3        : 2372  102  73    51    24.5 %   2568   29.4 %
KnightDreamer 3.3         : 2367   31  23   497    50.8 %   2361   30.0 %

Naum 1.8-b1               : 2609  110  65    38    55.3 %   2572   52.6 %
Naum 1.8b1                : 2578   95  70    51    53.9 %   2551   37.3 %

Pepito 1.59               : 2574   93  93    38    50.0 %   2574   36.8 %
Pepito v1.59              : 2434   32  27   417    52.9 %   2414   28.1 %

Ruffian 2.1.0             : 2644   16  12  1715    52.3 %   2628   34.2 %
Ruffian 2.10              : 2573  147 453     5    30.0 %   2720   60.0 %

There are several others I am not sure of like this one:
Shredder 9                : 2756   10  14  2666    69.1 %   2616   29.4 %
Shredder 9 UCI            : 2713  128 104    26    61.5 %   2631   38.5 %

>>only some professional programs have a very large number of games.
>
>You must be looking at a different list, I see
>Amy, AnMon, Delfi, Zappa and many others have 1000+ games.
>
>>There are also George and Leo's lists, all of which impart useful information.
>>
>>But I think it is not accurate to say that CEGT or AEGT can replace the SSDF.
>
>SSDF is just another list. They test the same handful of Chessbase engines again
>and again. Great if you're a big chessbase fan, boring if you're not.

Chessbase is my least favorite chess engine manufacturer among the professional
ranks.  And yet I truly enjoy the SSDF data.

>I want more engines tested and faster. When the SSDF is out 6 month later it's
>old news anyway.

There is that.  It is a stupendous effort to run 100,000 games at 40/2 like the
SSDF has done, and the next 100,000 will be just as painful as the first were to
generate.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.