Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to use a [cough] EPD test suite to estimate ELO

Author: Enrique Irazoqui

Date: 15:51:13 02/12/99

Go up one level in this thread


On February 12, 1999 at 14:06:49, KarinsDad wrote:

>Could a large test suite (200+ positions) with random opening, middlegame, and
>endgame positions be created that could then be compared against programs?
>
>Would this make more sense as compared to the contrived test suites which
>attempt to have weird or difficult positions to analyze?

I don’t think it is just a matter of quantity.

I know that some programmers use test suites of a few hundred positions, even
one thousand in one case, and they are still wise enough to mistrust them
completely when they have to decide which beta version is best. You can keep
adding positions in this sort of brute force approach, without knowing if these
positions are valid samples of what a program will have to deal with in real
life games.

Tiger comes to mind. In the endings, it has no idea about bad bishops, Philidor
and Lucena endings, draws in Kp vs. KQ when the pawn is on the seventh rank and
a, c, f, h files, etc. Still, Tiger is better than most in the endgames I have
seen it play. Instead, it does badly in endgame tests, because these tests are
full of Lucenas and Philidors and bad bishops, and they don’t reflect real life.
I mean: how many positions have to check the playing ability of a program in
specific rook endings like the Philidor? How many and of which kind about the
passed pawn evaluations? Etc. Go brute force and you will end up with a result
that won’t work as a reflection of reality. And this is only the endgame, so
much more systematically structured and studied than the middlegame. How do you
approach the middlegame test sets? What kind of tactics we examine, what kind of
positional knowledge, in which proportion?

Imagine the disaster of an IQ test consisting of 200 or 1000 questions put
together with the brute force approach, without prior and systematic knowledge
about the weight of each question in relation to what the tester is looking for.
And after all we are talking about an IQ for programs.

We just don't have this kind of systematic, explanatoty knowledge of chess. So
we better rely on "intuition" than test sets, no? It is a more scientific
approach. :)

Enrique



>KarinsDad



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.