Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The Ruffian test after 43 games by each engine

Author: Dann Corbit

Date: 11:12:28 03/03/04

Go up one level in this thread


On March 03, 2004 at 01:18:19, Peter Skinner wrote:

>On March 02, 2004 at 21:13:51, Dann Corbit wrote:
>
>>Notice these entries from the SSDF list:
>>                                           Rating +   - Games Won Av.opp
>>11 Chess Tiger 15.0 256MB Athlon 1200 MHz  2719   23 -22 968  59% 2655
>>...
>>13 Chess Tiger 14.0 CB 256MB Athlon 1200   2717   30 -30 557  61% 2638
>>
>>The ratings are very close.  I imagine that the evaluations will be similar.
>>Does that somehow indicate fraud to you?
>
>It would depend. If version 15.0 was advertised as a "50 point elo" increase
>from 14.0, then yes I would consider that fraud.
>
>>
>>And now look at this:
>>25 Gandalf 4.32h 256MB Athlon 1200 MHz  2658 31 -31 514 53% 2635
>>...
>>27 Gandalf 5.0 256MB Athlon 1200 MHz  2649 45 -46 242 44% 2692
>>28 Gandalf 5.1 256MB Athlon 1200 MHz  2637 25 -25 758 55% 2604
>>
>>Notice that newer versions may even be slightly weaker than older versions
>>(though the difference is not statistically significant).  Does that indicate
>>fraud to you?
>
>See above answers...
>
>
>>All that it means to me is that it is very difficult to make a strong program
>>stronger.  I am sure that an author who makes a new release of his program
>>imagines it to be better, and significantly so.  The testing done by an author
>>may not get the same results as the testing done by an independent >organization.
>>
>>In my view, falsely accusing someone of fraud is as bad as committing fraud.
>>
>>Hinting that someone may have committed fraud is not as bad as that.  But it
>>still is not a very pleasant thing to do.
>>
>>IMO-YMMV.
>
>I have not once said that I think he did. I was looking at data that does
>suggest something _could_ be awry. I did state that I did not think so, and I
>_hoped_ it wasn't the case.

Yes, you did not directly accuse him.

>Personally I love proving "advertising" wrong. It is sort of a hobby. I hate
>advertising that is misleading, and I have even went as far as to quit a job
>because of the bad advertising that company did.

All of it is bogus.  I think if you read any chess program box, 50% of the
claims are misleading crap.  That is the nature of advertizing.

>I believe it was Frank Quinsky who stated here in this very forum that Ruffian
>2.0.0 was "100 elo" better than 1.0.5. That was _obviously_ misleading, and
>completely untrue.

Well, Frank is a very optimistic guy.  I expect he played 30 games and drew
early conclusions.  It may also be true that under his testing conditions the
program does play 100 Elo better.  I saw his claims.

>It does not take a rocket scientist to look through the advertising, the
>optimizations, the comments of a new evaluation technique to see that certain
>free version come to the same conclusion as their commercial counter-parts. It
>also is reasonable to conclude that the commercial version are not indeed 100
>elo better than the free counterparts.
>
>Certainly there is confusion why from 2.0.0 we now have two upgrades, smaller in
>exe size, yet all seem to suffer from the ponder bug. Even the older free
>versions have the same bug. How does one go from 1.0.1 to 2.1.0 without fixing
>that bug. It puzzles me..
>
>The new versions could be just that, but there is some evidence that they are
>not. Whether than evidence is conclusive has yet to be seen.
>
>I have went on record as stating I am not accusing Per-Ola of anything, as I
>have spoken with him online and I don't think he would do something like this.
>
>Peter.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.