Author: Enrique Irazoqui
Date: 11:59:35 12/12/99
Go up one level in this thread
On December 12, 1999 at 14:27:39, John Warfield wrote: >On December 12, 1999 at 06:26:50, Enrique Irazoqui wrote: > >>On December 11, 1999 at 20:18:45, John Warfield wrote: >> >>>On December 11, 1999 at 19:46:50, Bertil Eklund wrote: >>> >>>>On December 11, 1999 at 17:52:56, Tom King wrote: >>>> >>>>>Which of the well known test suites predicts the strength of chess programs most >>>>>accurately? >>>>> >>>>>I ask this, because I recently made some *slight* mods. to the evaluation >>>>>function in my program, Francesca. I ran the LCT-2 suite, and the results >>>>>indicated that it was a wash - the modification gave me about 5 ELO points, >>>>>apparently. >>>>> >>>>>I then ran a series of fast games against another amateur program. I realize >>>>>it's important to play a large number of games, to reduce the margin of error, >>>>>so I ran two matches of 65 games. The result was this: >>>>> >>>>>MATCH 1 >>>>>"Normal" Francesca scored 37% against the amateur program. >>>>> >>>>>MATCH 2 >>>>>"Modified" Francesca scored 45% against the amateur program. >>>>> >>>>>Quite a difference! It implies that the modification is worth over 50 ELO. I >>>>>guess I need to play more games, against a variety of programs to verify whether >>>>>this improvement is real, or imaginary. >>>>> >>>>>Anyhow, beware of reading too much into ELO predictions of test suites.. >>>>> >>>>>Cheers All, >>>>>Tom >>>> >>>>Hi! >>>> >>>>Mr Irazoquis secret test-suite is very impressing! I think it´s about 111 >>>>positions. He can predict a new programs strength better than any other test I >>>>have seen so far. If his predictions remains as good as his previous results, I >>>>hope we can stop publishing our list and just play for fun. >>>> >>>>Bertil SSDF >>> >>> Why is this Test secret?? >> >>I don't publish my test because the moment I do it will be cooked and become >>worthless. This is one of the reasons that make well known test suites >>inaccurate, aside from the fact that they have few positions, some of these >>positions are ambiguous or plain wrong and the rating formula doesn't make >>sense. Results are so erratic and unrealistic that, for example, Fritz comes >>best at the BS test and worst at the BT. Etc. etc. >> >>A couple of months ago we started talking about test suites during one of the >>Rebel GM games at ICC, and a programmer was straightforward enough to say that a >>test won't work because he would cook it next day... >> >>My test has by now 130 positions not included in any other test and took me 11 >>months so far to put it together, and quite a bit longer to figure it out, so >>you can imagine that I feel quite reluctant to throw it to the garbage. But it >>is a bit of a catch 22 situation: If I don't publish it, no one will trust it; >>if I do, no one should. :( >> >>In case you are interested, this is my current result of latest programs: >> >>PIII-500 Test SSDF scale >>RT 0 2691 >>CM6K -16 2675 >>N732 -27 2664 >>F6-F6a -33 2658 >>F532 -33 2658 >>H732 -38 2653 >>J5 -70 2621 >>C171 -104 2587 >> >>Now I am running it with Shredder 4, Genius 6.5 and Zarkov 5, but it takes 2 >>boooooring days per program and I feel quite lazy at the moment. >> >>Enrique > > > HI Enrique > > > If you ever decide to release this test, I would like to be one of the first >to recieve it. Sure, but don't hold your breath... :) Enrique
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.