Author: Don Dailey
Date: 13:50:22 12/13/97
I have an idea for generating a positional problem set to
measure our chess programs against. This will take some
cooperation and should involve many programs.
1. Start with (n) positions from Grandmaster games and make
a note of the move the grandmaster played.
2. Run each participating program for some length of time (t)
on each position. Note the move the program chooses at
the end of the allotted time.
3. Determine which positions there is a "consensus" on. The
best move must be agreed on by all programs.
4. Throw out the "easy" ones. Consider this our problem set.
For this to work, t (time) should be pretty high. The rational
is that all our programs are of Grandmaster strength, given enough
time to think. If all the "grandmasters" in our sample (including
the human who played the game) agree on the best move, the evidence
is fairly strong that the move is indeed the best move and the noise
will be quite low.
Since all the programs will be capable of solving every problem,
the issue will be how quickly the "solutions" are found. A
scoring system should be determined in advance that we can use
to rate our programs based on the problem set that is eventually
chosen.
The easy problems should be filtered away. If every program
chooses the right move very quickly, we probably should not
consider this postion worthy of including in our problem set.
The goal is to have a single "best" move, but the move should
not be trivial.
I have no idea whether this technique will produce a useful
positional problem set. But I would be willing to prepare
the intial fen positions from a set of random positions I
prepared from master games. These positions are completely
random and were culled from one of the CDROM databases I have.
Here is what I would need from each participant:
a) The time your program first chose and kept the move that
was it's final choice. It might also be useful to know
if the program "wavered" on earlier iterations, did it
change it's mind a few times?
b) The program you are using and the hardware you are running
on. Probably we should adjust for hardware and choose
our run times based on something pretty standard, like a
pentium pro 200. The exact time and hardware is not
critically important, but it should meet some minimum
requirement so as to not degrade the test.
I would like some feedback on this from you guys. Do you think
it is worth pursing? Will it produce a useful set? Who is
interested in participating? Do you have some suggestions or
improvements?
If enough people want to try this, I have 1000 fen positions
with grandmaster moves attached to them. I can run Cilkchess
through these 1000 positions (we have access to lots of hardware
here) and post the results of all 1000 positions, along with
a reduced set reflecting those positions Cilkchess "consents"
to being in the set. I suspect this will cut the set down a lot.
I will suggest that each problem is run for 30 minutes on each
machine of at least pentium pro 200 performance (or you can
adjust for lesser hardware.) I am expecting that after Cilkchess
performs the first pass, there will only be a fraction of the
original 1000 positions left and everyone will be willing to
run 30 minutes on each one.
Once this step is done we may have a useful positional problem
set. Even if we do not perhaps we will learn something!
The participants would be the actual programmers whenever possible
but we can get the data from any source, and I know there are a
lot of enthusiastic chess program owners who might be interested
in contributing the test time.
I will wait for feedback before proceeding.
-- Don
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.