Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: And I just don't get that ...

Author: Mike S.

Date: 04:05:50 01/04/04

Go up one level in this thread


On January 03, 2004 at 20:53:39, Rick Rice wrote:

>(...)

>Ideal solution would be for SSDF to have one massive board with one CPU and
>memory for each program (equal CPU and mem for all the progs on its list) and
>some way to automate the play of these programs against each other..... on
>different time controls such as regular, blitz etc. Just wishful thinking for
>the future, but it would eliminate the multiple and varying results.

Actually that's what they do anyway, with the exceptions that they run only
40/2h (and of course can't run their tests with every existing engine, due to
limited capacities).

But we can ask even further: What's the use of knowing which engine is "most
probably" 20 Elos better or worse? How do you use that information, what is the
sense of it?

Programmers need such infos to determine, if their current engines are improved
or not, in addition to other tests (positions, and more technical things I
guess). But they need to have their own sources anyway, because SSDF and ohter
"public" testers can't usually start earlier than before the engine is released.
(Beta tests in these common ratings list remain exceptions.)

It has been explained that even ratings based on many games can't allow highly
reliable predictions for short matches, i.e. 10 games, especially when the
strength margins are so narrow between many engines like it is now. In short
matches, (nearly) anything can happen.

1. When somebody wants to run engine X against Y, does he require to have rating
infos before, for a prediction? What for?

(I just hope nobody runs matches only to verify or falsify predictions based on
some ratings :-))

2. When the rating competitions can't produce valuable data that really is worth
the effort, I'm beginning to doubt that it makes sense at all.

3. I suggest that every fan and user should define his *individual* set of
requirements, IOW what an engine should be capable of, so that he consideres it
to suit his needs, or what's necessary that he'll consider it to be a top engine
worth to buy the next version too etc. This doesn't have to be objective (but
individually subjective rather). For example, I've defined a set of tactical
positions which inform me about the engines combinative strength. For me, this
is more interesting than overall comp-comp gameplay performance. SSDF Elos are a
relatively "artificial" values, correctly based on performances of course, but
not meaning much for the way I actually use engines. - Another thing is endgame
knowledge; I like it better when engines have certain understanding in the late
endgame and do not rely on tbs. too much (but for that, I didn't define a test
set yet). These things don't have to correspond with the strenght relations in
games at all.

Another user may be more interested in good positional play or -evaluation,
understanding of positional elements... stronger players than me probably can
define their own test sets for that as well, or even judge upon it by playing
against the engines themselves. Some users do this, and/or provide in-depth
commentary in the message boards.

In think, that way of (individually) getting impressions and estimations of
engines, should be considered much more often than to look at all these Elo
figures always. It gets boring. An engine may be 100 Elos behind and play great,
entertaining games (and vice versa)...

Regards,
M.Scheidl



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.