Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to use a [cough] EPD test suite to estimate ELO

Author: Dann Corbit

Date: 14:32:35 02/12/99

Go up one level in this thread


On February 12, 1999 at 17:03:53, KarinsDad wrote:
[snip]
>But this is irrelevant. Isn't it? If you have a large enough sample set, the
>correct answer is correct, regardless of whether the computer thought it was
>correct for a different reason. When you actually have computers or humans play
>games, it doesn't matter why a move is choosen, just that given a large sample
>set of games, there is a win-draw-loss record against opponents of certain
>"strength" and hence, you have a rating accordingly.
Depends on what you want to measure.  If you want to know the computer's
ability, you should measure whether the choice was accidental or not.  Let's say
we are changing answers from true/false to a bit of a short answer type.  In
other words, a monkey can get 50% right on push the true/false button  but if
you ask why each time, it will drop to 0.  The monkey does not know why.

>Computers and humans both blunder into bad positions either for poor evaluation
>reasons, or when bad positions are beyond their event horizon. Your skill comes
>in based on the percentage of times that you make "correct" or nearly correct
>moves and the percentage of times you make outright blunders. So if a program
>detects a correct move, then for that position, it did ok, regardless of the
>reasons why.
>
>Stating that you must get not only the correct move, but the pv and the ce (what
>is a ce? I asked this question before and do not think I got a response, sounds
>like an evaluation number) seems overkill.
The pv would not have to be identical, but should be present for verification.
The ce is the centipawn eval.  So if the ce is -1000, then the computer thinks
it is down by roughly a queen.

>>  The problems are not weighted equally.   The trivial problems are not
>>worth much and the difficult problems are worth a lot.  The test suite would
>>then be calibrated against every machine/program in the SSDF and every GM
>>willing to give it a go.  Every ameteur program could be plied against the test,
>>and any player with a rating could give it a go.  All the data accumulated could
>>be used to create a test suite which accurately ranks players, whether machine
>>or human.  By having several thousand positions, you could weed out memorizers.
>>Besides which, anyone would could memorize several thousand positions and their
>>solutions could probably do well on their own anyway.  But even having all of
>>this, it would be not too difficult to create programs that could cheat against
>>the test.
>
>Cheating against the test is not the question, is it? The question is whether a
>test can be devised that could approximate rating. If someone cheats, what have
>they accomplished? Once their program goes head to head with others in
>competition, then their true rating will eventually shine through.
Very true.  I was just pointing out that no test suite can be made fullproof.
It is sort of like the whetstone bridge.  Some shady C compiler writers actually
put in code to recognize it and do special optimizations.  Made them look better
in evals.  But people caught on and much larger test suites came about.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.