Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The Ruffian test after 43 games by each engine

Author: Dann Corbit

Date: 18:13:51 03/02/04

Go up one level in this thread


On March 02, 2004 at 19:37:19, Peter Skinner wrote:
>I have tested, and I have read all the testing others have done, and the same
>data always seems to come forward:
>
>1. Ruffian 1.0.1 finshing within a single point of 2.1.0. Usually is happens to
>be a .5 point differnce.
>
>2. Ruffian 2.0.2 and 1.0.5 seem to finish within 1 to 1.5 point of each other.
>
>3. Very few test results have shown 2.1.0 or 1.0.5 to be stronger than 2.0.2,
>and 1.0.5 respectively. I know in the Ridderk tournament 1.0.5 did finish lower
>than 1.0.1, but that was only by 4 points.. luck could have been a contributing
>factor.
>
>4. When analyzing positions with those 4 versions, 2.0.2 and 1.0.5 come out to
>the same result, just 2.0.2 does it quicker. Same goes when analyzing with
>1.0.1/2.1.0.
>
>5. Personally I don't believe Per-Ola would do something like this, but the data
>does speak volumes. It is hard to just toss it aside.
>
>I do want to go on record and state that I don't believe this to be the case, or
>rather I am seriously hoping this is not the case. It would constitute a major
>fraud..

From here:
http://www.scharlesassociates.com/cases/psifraud-definefraud.htm

We have this:
"Fraud; the intentional use of deceit, a trick or some dishonest means to
deprive another of his/her/its money, property or a legal right. A party who has
lost something due to fraud is entitled to file a lawsuit for damages against
the party acting fraudulently, and the damages may include punitive damages as a
punishment or public example due to the malicious nature of the fraud. Quite
often there are several persons involved in a scheme to commit fraud and each
and all may be liable for the total damages. Inherent in fraud is an unjust
advantage over another which injures that person or entity. It includes failing
to point out a known mistake in a contract or other writing (such as a deed), or
not revealing a fact which he/she has a duty to communicate, such as a survey
which shows there are only 10 acres of land being purchased and not 20 as
originally understood. Constructive fraud can be proved by a showing of breach
of legal duty (like using the trust funds held for another in an investment in
one's own business) without direct proof of fraud or fraudulent intent.
Extrinsic fraud occurs when deceit is employed to keep someone from exercising a
right, such as a fair trial, by hiding evidence or misleading the opposing party
in a lawsuit. Since fraud is intended to employ dishonesty to deprive another of
money, property or a right, it can also be a crime for which the fraudulent
person(s) can be charged, tried and convicted. Borderline overreaching or taking
advantage of another's naiveté involving smaller amounts is often overlooked by
law enforcement, which suggests the victim seek a "civil remedy" (i.e., sue).
However, increasingly fraud, which has victimized a large segment of the public
(even in individually small amounts), has become the target of consumer fraud
divisions in the offices of district attorneys and attorneys general."

Fraud has not been committed, regardless of whether or not the binaries are
close derivatives of each other.

Notice these entries from the SSDF list:
                                           Rating +   - Games Won Av.opp
11 Chess Tiger 15.0 256MB Athlon 1200 MHz  2719   23 -22 968  59% 2655
...
13 Chess Tiger 14.0 CB 256MB Athlon 1200   2717   30 -30 557  61% 2638

The ratings are very close.  I imagine that the evaluations will be similar.
Does that somehow indicate fraud to you?

And now look at this:
25 Gandalf 4.32h 256MB Athlon 1200 MHz  2658 31 -31 514 53% 2635
...
27 Gandalf 5.0 256MB Athlon 1200 MHz  2649 45 -46 242 44% 2692
28 Gandalf 5.1 256MB Athlon 1200 MHz  2637 25 -25 758 55% 2604

Notice that newer versions may even be slightly weaker than older versions
(though the difference is not statistically significant).  Does that indicate
fraud to you?

All that it means to me is that it is very difficult to make a strong program
stronger.  I am sure that an author who makes a new release of his program
imagines it to be better, and significantly so.  The testing done by an author
may not get the same results as the testing done by an independent organization.

In my view, falsely accusing someone of fraud is as bad as committing fraud.

Hinting that someone may have committed fraud is not as bad as that.  But it
still is not a very pleasant thing to do.

IMO-YMMV.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.