Computer Chess Club Archives


Search

Terms

Messages

Subject: the most craziest test ever run in CEGT? - some explanations

Author: Heinz van Kempen

Date: 16:55:24 10/17/05

Go up one level in this thread


On October 17, 2005 at 16:25:24, Ray Banks wrote:

>I think the answer is simple - you just can't rely on the rating at all until
>you get 300 games or more in the list. I think if I was Heinz I just wouldn't
>put any engine into the list until it hits that figure.

Hi Robert, Uri, Ray and all,

maybe Ray is correct and we should wait for 300, 400, 500 games before giving
ratings and results. We did so when running the Ktulu 7.0 test and this was fine
because we had big statistical fluctuations in that one.

So what is happening here. Maybe we can find some explanations for the following
results (all this matches were run be me):

Fruit 2.2 Uri vs. Deep Fritz 8 512MB 2CPU --- 21,5:20,5

I think this result is okay. Fruit is using only 256 MB and Fritz is using 2
CPU´s and 512 MB. I do not want to discuss here if this makes sense, because it
is a test for deep versions. But the result is okay when you know that Fritz is
not the favourite opponent for Fruit, anyway it hurts a bit the rating so far as
Deep Fritz is "only" on 2745 ELO and was 2735 before the match.

Fruit 2.2 Uri vs. Shredder 9 Columbus'egg 9g --- 27,5:13,5 !!!

A devastating result. Shredder 9 Columbus'egg 9g was only a bit below default
with 2750 before this match, now it dropped to 2698. ELO performance from this
match alone is 2832. So this should give a big push in ELO for Fruit 2.2 Uri. It
does not. Why not? Because there were few games with the Columbus setting before
this match (only 110 games) and I was keen on having results for both.
Regarding the Fruit 2.2 Uri match this was a (temporary) mistake to run this
match, but there will come more games for Columbus'egg now and when this setting
does better against other engines and again comes close to default, you can
imagine that this will give a push to Fruit 2.2 Uri, too and a big one.

Let us continue. The following matches were played against stronger amateurs on
the same machine, under the same GUI, with Fruit WCCC'05 History Threshold 50,
not prone to the altered parameters bug:

- against Scorpio 1.3 (2570 after the match): 90% (+16, =4, -0)- Elo performance
here is 2890

- against ET Chess 300805 (2543) Elo performance only 2727 after 21 games

next machine (again two matches):

- against Jonny 2.82 63,1% out of 42 games and a miserable Elo performance of
2687 ???

- against Naum 1.82 on the same machine an ELO performance of 3001 after initial
result of 8:0. Meanwhile I see 10:0 here

- and against WildCat an initial ELO performance of 2838 after 10 games

From results by other CEGT testers the one that hurt most was against Spike with
an ELO performance of only 2661.

I can already predict that rating will shoot up as soon as I have more games
with Columbus'egg G9.

Games will be available for download this Tuesday evening with comments and you
can all check that there is nothing unusual. The choice of opponents from my
side however was not the best regarding Columbus'egg, but this will be
corrected, as I will run now first a few matches more for that one.

Comments welcome.

Best Regards
Heinz

http://www.husvankempen.de/nunn/





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.