Author: Christophe Theron
Date: 19:26:49 03/19/00
Go up one level in this thread
On March 19, 2000 at 15:41:30, Bertil Eklund wrote:
>
>Hi!
>
>A very impressive result from Shredder4.
>
>IMO Shredder plays positionally very good and excellent in the endgames.
>Nimzo is a bit stronger tactically.
>
>Shredder4 used all 4 Turbo-CDs.
>
>Bertil
Bertil,
I am not sure this message is going to be well accepted. So let me first state
that I have the greatest respect for your work and the SSDF.
Let me also state that I have a lot of respect for Nimzo, Shredder, and their
respective authors.
However, I can only strongly disagree with your sentence "a very impressive
result from Shredder4".
You have played a 40 games match. Under these conditions, and given the result
(23-17 in favor of Shredder) it is absolutey impossible to say with a 95%
confidence that Shredder is stronger than Nimzo. It is not even possible to say
it with a 80% confidence.
So saying that it is "a very impressive result from Shredder4" is, to say the
least, very far stretched. Unless you were assuming that Shredder was weak, but
you weren't, were you?
I know my remarks here could be interpreted as bad taste from me. I just want,
as I have done several times in the past, introduce a little bit more of good
sense in the interpretation of results.
I have already seen people claiming that program X was better than program Y
because X won against Y by 6-4 in a 10 games match. This is pure nonsense, of
course (well I say of course, but do people know why?).
Similarly, a 23-17 result is not significant (at least not significant enough to
qualify it as an "impressive" victory), unless you are willing to take a big
risk in your statement. That's exactly why the SSDF insists that program
rankings are published together with the intervals of confidence computed for
these rankings.
This is not to say that your match is not significant. This is not to say that
Shredder is not stronger than Nimzo (I do not know actually, I don't even own
these programs). When this result will be added to other matches played by
Shredder and Nimzo, we will get a much better picture (and a much better
confidence) about the respective strength of these programs.
I hope my remark will not be interpreted negatively. I think the topic of
confidence intervals on match results should deserve much more attention from
the computer chess enthousiasts, and I regret that it is not discussed more
often here, on CCC.
And Bertil, keep on your good work!
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.