Author: Heinz van Kempen
Date: 16:55:24 10/17/05
Go up one level in this thread
On October 17, 2005 at 16:25:24, Ray Banks wrote: >I think the answer is simple - you just can't rely on the rating at all until >you get 300 games or more in the list. I think if I was Heinz I just wouldn't >put any engine into the list until it hits that figure. Hi Robert, Uri, Ray and all, maybe Ray is correct and we should wait for 300, 400, 500 games before giving ratings and results. We did so when running the Ktulu 7.0 test and this was fine because we had big statistical fluctuations in that one. So what is happening here. Maybe we can find some explanations for the following results (all this matches were run be me): Fruit 2.2 Uri vs. Deep Fritz 8 512MB 2CPU --- 21,5:20,5 I think this result is okay. Fruit is using only 256 MB and Fritz is using 2 CPU´s and 512 MB. I do not want to discuss here if this makes sense, because it is a test for deep versions. But the result is okay when you know that Fritz is not the favourite opponent for Fruit, anyway it hurts a bit the rating so far as Deep Fritz is "only" on 2745 ELO and was 2735 before the match. Fruit 2.2 Uri vs. Shredder 9 Columbus'egg 9g --- 27,5:13,5 !!! A devastating result. Shredder 9 Columbus'egg 9g was only a bit below default with 2750 before this match, now it dropped to 2698. ELO performance from this match alone is 2832. So this should give a big push in ELO for Fruit 2.2 Uri. It does not. Why not? Because there were few games with the Columbus setting before this match (only 110 games) and I was keen on having results for both. Regarding the Fruit 2.2 Uri match this was a (temporary) mistake to run this match, but there will come more games for Columbus'egg now and when this setting does better against other engines and again comes close to default, you can imagine that this will give a push to Fruit 2.2 Uri, too and a big one. Let us continue. The following matches were played against stronger amateurs on the same machine, under the same GUI, with Fruit WCCC'05 History Threshold 50, not prone to the altered parameters bug: - against Scorpio 1.3 (2570 after the match): 90% (+16, =4, -0)- Elo performance here is 2890 - against ET Chess 300805 (2543) Elo performance only 2727 after 21 games next machine (again two matches): - against Jonny 2.82 63,1% out of 42 games and a miserable Elo performance of 2687 ??? - against Naum 1.82 on the same machine an ELO performance of 3001 after initial result of 8:0. Meanwhile I see 10:0 here - and against WildCat an initial ELO performance of 2838 after 10 games From results by other CEGT testers the one that hurt most was against Spike with an ELO performance of only 2661. I can already predict that rating will shoot up as soon as I have more games with Columbus'egg G9. Games will be available for download this Tuesday evening with comments and you can all check that there is nothing unusual. The choice of opponents from my side however was not the best regarding Columbus'egg, but this will be corrected, as I will run now first a few matches more for that one. Comments welcome. Best Regards Heinz http://www.husvankempen.de/nunn/
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.