Author: Mike S.
Date: 05:49:21 06/10/04
Go up one level in this thread
On June 09, 2004 at 19:05:40, Rolf Tueschen wrote: >(...) >Now Hagra, an anonymous author with a good knowledge of statistics and chess has >made the strongest critic against the test that I know of. He basically doubts >that a chess position from real life chess can test a machine because it is >difficult to decide why the machine has adopted a specific continuation. Actually this would be a critizism of *all* test suites, because all of them follow the same concept: (Simply) *find the move* (but not, find the move and give me perfect explanation/evaluation/analysis of why it is best). So, if that critic would be valid, it would fit not only to the WM Test but obviously to all test suites. So, all authors and users of test suites didn't spot that fatal mistake yet, since so many years...? :-)) I don't claim that one single man cannot have the unique and correct ideas while all others are wrong, sometimes, but such "Galileo" cases (and also "Leonardo" cases :-)) are very very seldom. >(...) The also >here known author Michael Scheidl assisted in that fight. Thanks for adressing me as a known author. So it seems that I have at least achieved a bit (it was a lot of hard work! :-)) >IMO the whole >argumentation is unfair because if already a general critic is sound and comes >to a negative judgement then the practical argument has no more sense at all. How do you see *if* such critic is sound for a specific position? There's no other choice than to analyse chess-wise (!) and illustrate the critic with variantion etc. The latest provided PGN is a good example of how such critic should be presented. - The "general" critic is not generally valid, because i.e. test authors do of course usually choose typical test-like (or test-fitting) moves to ensure, or to make it most likely at least, that they won't be chosen for wrong reasons without proper understanding. Typical are *sacrifices* no engine would choose "just for fun" so to speak. You don't waste a rook without seeing the gain. Of course you can't, or in general shouldn't, use positions for a test where the solution move is a normal boring 08/15 move which doesn't allow any conclusion what it is played for, in itself. Below I give two examples from my Quicktest to illustrate this. Tell me if you find engines which play these solutions for the wrong reasons. :-) (For a *very big* test, 1000+ positions, the above could eventually be ignored because when an engine A i.e. plays strong moves in 867 out of 1000 and engine B in only 427, it will always tell a lot about the A/B analysis power relations disregarding the "correct reason" question.) Did you take a look a the currently available WM Test results of 230 (!!) engines yet? You'll find the known strong engines at the top, medium engines (good amateurs, older profis) in the medium ranks and weaker engine at the end of the ranking list. (With only very few exceptions or "surprising" rankings.) How would you explain these results when that whole test (and -method) wouldn't be valid?? Is it wizardry? :-) >(...) it is also >possible to play Rad8 and way later Re3 instead of the test solution Re3. But Rad8 has no forcing character IMO and threats nothing special. Re3 issues the strong threat Rxg3. (Just an observation. - I hope Mikhail will add some comments about this.) >All >who know details about tests know that the fact of a second solution decreases >the value of a test position. True - but only when there really is a second solution of *almost the same strength*; I think alternatives which are clearly weaker are not a problem because it is the challenge to find the *best* move and not just a good move, *in analysis* (it could be discussed if that is different in practical games). Of course, perfectly clear positions whithout alternatives, i.e. only move X draws and all others lose, are preferable. >But the readers shouldn't forget that the main problem for such engine tests is >the finding of positions which allow to test what the test founder pretended. >Here the WM Test allegedly can test the ability to analyse. (...) It can, because 1. we know the good difficult continuations of the test postions, and 2. during the test run, engines have to analyse these. So the engines analyse and we can compare and judge about the analysis results (at move #1). Basically that's not different from the way 99.9% of all test suites are done. >In other words, the academic doctor MG claims a deeper meaning >with his test but in reality he has put together these 100 positions without >showing the validity of the positions for his own insinuations into the test! IMO the question of validity has to be answerd chess-wise in the first place, and that MG has done by giving solution variants, subvariants and comments in the data provided with the WM Test package. Nevertheless, many positions require some studying of the user to be convinced of and/or to understand the solution. Some I found very difficult and had my doubts too. This may be strength dependant (it may be more clear for stronger players - and for engines - often). Nowadays tests just have to be very difficult to reveal any significant differences between strong engines. Anyway, it will always be best to discuss test positions and -suites based on direct chess-wise analysis etc. which may supported (or started) by engine output only. This will lead to constructive dialogue and fun with chess itself. One point of the discussion was that strange observations of engine output *only* are not sufficient to base valid critizism on, and if you look at the last critic issued, it seems that there is consensus about this :-) maybe with the exception of you (?). mfg. Michael Scheidl [D]1n1r1rk1/ppq2ppp/3p2b1/3B1NP1/4PB1R/bP2P2P/P1P5/3KQ1R1 w - - 0 1 1.Qc3! (Quick-01) [D]3Q4/3p4/P2p4/N2b4/8/4P3/5p1p/5Kbk w - - 0 1 1.Qa8! (Quick-03)
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.