Author: John Warfield
Date: 11:27:39 12/12/99
Go up one level in this thread
On December 12, 1999 at 06:26:50, Enrique Irazoqui wrote: >On December 11, 1999 at 20:18:45, John Warfield wrote: > >>On December 11, 1999 at 19:46:50, Bertil Eklund wrote: >> >>>On December 11, 1999 at 17:52:56, Tom King wrote: >>> >>>>Which of the well known test suites predicts the strength of chess programs most >>>>accurately? >>>> >>>>I ask this, because I recently made some *slight* mods. to the evaluation >>>>function in my program, Francesca. I ran the LCT-2 suite, and the results >>>>indicated that it was a wash - the modification gave me about 5 ELO points, >>>>apparently. >>>> >>>>I then ran a series of fast games against another amateur program. I realize >>>>it's important to play a large number of games, to reduce the margin of error, >>>>so I ran two matches of 65 games. The result was this: >>>> >>>>MATCH 1 >>>>"Normal" Francesca scored 37% against the amateur program. >>>> >>>>MATCH 2 >>>>"Modified" Francesca scored 45% against the amateur program. >>>> >>>>Quite a difference! It implies that the modification is worth over 50 ELO. I >>>>guess I need to play more games, against a variety of programs to verify whether >>>>this improvement is real, or imaginary. >>>> >>>>Anyhow, beware of reading too much into ELO predictions of test suites.. >>>> >>>>Cheers All, >>>>Tom >>> >>>Hi! >>> >>>Mr Irazoquis secret test-suite is very impressing! I think it´s about 111 >>>positions. He can predict a new programs strength better than any other test I >>>have seen so far. If his predictions remains as good as his previous results, I >>>hope we can stop publishing our list and just play for fun. >>> >>>Bertil SSDF >> >> Why is this Test secret?? > >I don't publish my test because the moment I do it will be cooked and become >worthless. This is one of the reasons that make well known test suites >inaccurate, aside from the fact that they have few positions, some of these >positions are ambiguous or plain wrong and the rating formula doesn't make >sense. Results are so erratic and unrealistic that, for example, Fritz comes >best at the BS test and worst at the BT. Etc. etc. > >A couple of months ago we started talking about test suites during one of the >Rebel GM games at ICC, and a programmer was straightforward enough to say that a >test won't work because he would cook it next day... > >My test has by now 130 positions not included in any other test and took me 11 >months so far to put it together, and quite a bit longer to figure it out, so >you can imagine that I feel quite reluctant to throw it to the garbage. But it >is a bit of a catch 22 situation: If I don't publish it, no one will trust it; >if I do, no one should. :( > >In case you are interested, this is my current result of latest programs: > >PIII-500 Test SSDF scale >RT 0 2691 >CM6K -16 2675 >N732 -27 2664 >F6-F6a -33 2658 >F532 -33 2658 >H732 -38 2653 >J5 -70 2621 >C171 -104 2587 > >Now I am running it with Shredder 4, Genius 6.5 and Zarkov 5, but it takes 2 >boooooring days per program and I feel quite lazy at the moment. > >Enrique HI Enrique If you ever decide to release this test, I would like to be one of the first to recieve it.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.