Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: The Ruffian test after 43 games by each engine

Author: Peter Skinner

Date: 22:18:19 03/02/04

On March 02, 2004 at 21:13:51, Dann Corbit wrote:

>Notice these entries from the SSDF list:
>                                           Rating +   - Games Won Av.opp
>11 Chess Tiger 15.0 256MB Athlon 1200 MHz  2719   23 -22 968  59% 2655
>...
>13 Chess Tiger 14.0 CB 256MB Athlon 1200   2717   30 -30 557  61% 2638
>
>The ratings are very close.  I imagine that the evaluations will be similar.
>Does that somehow indicate fraud to you?

It would depend. If version 15.0 was advertised as a "50 point elo" increase
from 14.0, then yes I would consider that fraud.

>
>And now look at this:
>25 Gandalf 4.32h 256MB Athlon 1200 MHz  2658 31 -31 514 53% 2635
>...
>27 Gandalf 5.0 256MB Athlon 1200 MHz  2649 45 -46 242 44% 2692
>28 Gandalf 5.1 256MB Athlon 1200 MHz  2637 25 -25 758 55% 2604
>
>Notice that newer versions may even be slightly weaker than older versions
>(though the difference is not statistically significant).  Does that indicate
>fraud to you?

See above answers...

>All that it means to me is that it is very difficult to make a strong program
>stronger.  I am sure that an author who makes a new release of his program
>imagines it to be better, and significantly so.  The testing done by an author
>may not get the same results as the testing done by an independent >organization.
>
>In my view, falsely accusing someone of fraud is as bad as committing fraud.
>
>Hinting that someone may have committed fraud is not as bad as that.  But it
>still is not a very pleasant thing to do.
>
>IMO-YMMV.

I have not once said that I think he did. I was looking at data that does
suggest something _could_ be awry. I did state that I did not think so, and I
_hoped_ it wasn't the case.

Personally I love proving "advertising" wrong. It is sort of a hobby. I hate
advertising that is misleading, and I have even went as far as to quit a job
because of the bad advertising that company did.

I believe it was Frank Quinsky who stated here in this very forum that Ruffian
2.0.0 was "100 elo" better than 1.0.5. That was _obviously_ misleading, and
completely untrue.

It does not take a rocket scientist to look through the advertising, the
optimizations, the comments of a new evaluation technique to see that certain
free version come to the same conclusion as their commercial counter-parts. It
also is reasonable to conclude that the commercial version are not indeed 100
elo better than the free counterparts.

Certainly there is confusion why from 2.0.0 we now have two upgrades, smaller in
exe size, yet all seem to suffer from the ponder bug. Even the older free
versions have the same bug. How does one go from 1.0.1 to 2.1.0 without fixing
that bug. It puzzles me..

The new versions could be just that, but there is some evidence that they are
not. Whether than evidence is conclusive has yet to be seen.

I have went on record as stating I am not accusing Per-Ola of anything, as I
have spoken with him online and I don't think he would do something like this.

Peter.

Re: The Ruffian test after 43 games by each engine Dann Corbit 11:12:28 03/03/04
- Re: The Ruffian test after 43 games by each engine Peter Skinner 14:00:37 03/03/04
  - Re: The Ruffian test after 43 games by each engine Dann Corbit 14:19:53 03/03/04

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.