Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Proving something is better

Author: Vincent Diepeveen

Date: 17:13:10 12/17/02

Go up one level in this thread


On December 17, 2002 at 19:49:47, Bob Durrett wrote:

There is a lot of truth in your statement.

the best way of a testset is indeed a combination of
"you must find this position for sure within 3 minutes
of time or your engine sucks ass as you lose the game
if you don't play that move"

However such a realistic test would have hardly tactical
positions anymore. It is trivial nowadays to make something
tactical stronger than any commercial program (assuming
equal hardware of course which nowadays with brutus at
a FPGA chip and diep at a supercomputer isn't exactly
even close to the truth).

It's about who makes the worst moves.

One thing no testset can measure however and that's how
a program plays. Some programs are world champion bringing
itself into trouble despite being not too bad overall,
just they play without plan and then get into a position
which was correct to get in but then they do not know
how to play further. i fear diep is one of the programs
belonging to that league :)

Whereas on the other hand some programs play very dubious
chess (junior for example) but manage to win because they
know how to continue in the positions they create.

Any positional testset, junior will come out as the commercial
program positional worst i bet (a lot of that positional badness
is sometimes caused by the selective form of searching in junior
i guess; a bad
positional move then delays the problems whereas the good moves
get seen a lot deeper and get a different score so do not pop up as
best). Yet it's 3 fold world champion,
very interesting and definitely no coincidence!

Another problem a testset will have to measure how good a program
is, is the patzer problem.

In general many testsets nowadays focus upon patzer moves. A big
patzer testset is the WM test. Another one is gs2930.

One of the worse endgame programs in the world (in the commercial
league) is Tiger1. Tiger2 is far superb in endgame compared to
tiger1. Yet tiger 1 scores better at endgame testset than tiger2.
Very weird.

Fritz on the other hand is a big patzer and scores very good
at king side problems. No program beats it in such testsets. It's
created to mate opponents seemingly :)

Patzer moves are hard to prevent. Of course should not always
get prevented. Patzer moves are usually giving a nice game but not
always a good game :)

Then another major problem is passive versus active play.

Shredder won a bunch of world titles playing very passive chess.

Opponents make at a certain time an action, and lose because of
that action.

Yet passive engines always score a lot less at all the existing
testsets.

Best regards,
Vincent

>On December 17, 2002 at 19:36:09, Vincent Diepeveen wrote:
>
>>On December 17, 2002 at 19:10:42, Dann Corbit wrote:
>>
><snip>
>
>Perhaps a useful test would be to measure how long a chess engine takes to get
>the right answer for a large set of diverse test positions.
>
>There would have to be some simple measure of "getting the right answer."  Maybe
>it would be sufficient to just measure the amount of time it took to obtain the
>*first* occurrence of the right answer.
>
>For example:  In a given test position, suppose the correct answer is c1e3.
>Then simply measure how long it took before the engine first started looking at
>c1e3.
>
>A more useful measure might be to measure how long it took for the engine to
>find and keep c1e3 for a fixed amount of time, such as one minute.
>
>How these times are to be recorded seems to be a detail to be worked out.
>
>Bob D.



This page took 0.09 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.