Author: Vincent Diepeveen
Date: 17:13:10 12/17/02
Go up one level in this thread
On December 17, 2002 at 19:49:47, Bob Durrett wrote: There is a lot of truth in your statement. the best way of a testset is indeed a combination of "you must find this position for sure within 3 minutes of time or your engine sucks ass as you lose the game if you don't play that move" However such a realistic test would have hardly tactical positions anymore. It is trivial nowadays to make something tactical stronger than any commercial program (assuming equal hardware of course which nowadays with brutus at a FPGA chip and diep at a supercomputer isn't exactly even close to the truth). It's about who makes the worst moves. One thing no testset can measure however and that's how a program plays. Some programs are world champion bringing itself into trouble despite being not too bad overall, just they play without plan and then get into a position which was correct to get in but then they do not know how to play further. i fear diep is one of the programs belonging to that league :) Whereas on the other hand some programs play very dubious chess (junior for example) but manage to win because they know how to continue in the positions they create. Any positional testset, junior will come out as the commercial program positional worst i bet (a lot of that positional badness is sometimes caused by the selective form of searching in junior i guess; a bad positional move then delays the problems whereas the good moves get seen a lot deeper and get a different score so do not pop up as best). Yet it's 3 fold world champion, very interesting and definitely no coincidence! Another problem a testset will have to measure how good a program is, is the patzer problem. In general many testsets nowadays focus upon patzer moves. A big patzer testset is the WM test. Another one is gs2930. One of the worse endgame programs in the world (in the commercial league) is Tiger1. Tiger2 is far superb in endgame compared to tiger1. Yet tiger 1 scores better at endgame testset than tiger2. Very weird. Fritz on the other hand is a big patzer and scores very good at king side problems. No program beats it in such testsets. It's created to mate opponents seemingly :) Patzer moves are hard to prevent. Of course should not always get prevented. Patzer moves are usually giving a nice game but not always a good game :) Then another major problem is passive versus active play. Shredder won a bunch of world titles playing very passive chess. Opponents make at a certain time an action, and lose because of that action. Yet passive engines always score a lot less at all the existing testsets. Best regards, Vincent >On December 17, 2002 at 19:36:09, Vincent Diepeveen wrote: > >>On December 17, 2002 at 19:10:42, Dann Corbit wrote: >> ><snip> > >Perhaps a useful test would be to measure how long a chess engine takes to get >the right answer for a large set of diverse test positions. > >There would have to be some simple measure of "getting the right answer." Maybe >it would be sufficient to just measure the amount of time it took to obtain the >*first* occurrence of the right answer. > >For example: In a given test position, suppose the correct answer is c1e3. >Then simply measure how long it took before the engine first started looking at >c1e3. > >A more useful measure might be to measure how long it took for the engine to >find and keep c1e3 for a fixed amount of time, such as one minute. > >How these times are to be recorded seems to be a detail to be worked out. > >Bob D.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.