Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Fritz losing to Shredder

Author: Chris Carson
Date: 12:18:24 06/16/99
On June 16, 1999 at 14:57:47, Eugene Nalimov wrote:

>On June 16, 1999 at 14:52:47, Melvin S. Schwartz wrote:
>
>>
>>On June 16, 1999 at 13:28:24, Dan Homan wrote:
>>
>>>On June 15, 1999 at 23:47:07, Melvin S. Schwartz wrote:
>>>
>>>>
>>>>I disagree. They're running programs on different hardware and that doesn't make
>>>>for intelligent evaluations of program vs. program. Furthermore, I didn't say
>>>>they shouldn't do it, but rather what is to be accomplished by testing program
>>>>against program on various types of hardware that is not of equal stature. They
>>>>can do it - but is it truly meaningful???
>>>>
>>>
>>>Depends on what you mean by meaningful.  This contest is to find the
>>>best artifical chess player.  I think that is pretty meaningful.
>>>
>>>Notice that I said "player" not program.  Clever algorithms are only
>>>one component of a chess player.   Hardware is important too.  Some
>>>artifical players use special purpose hardware.... Deep Blue for
>>>example.  The question is: "What is the best artifical player?"
>>>
>>>Now, if you want to use the results to say something about the
>>>relative strength of the algorithms you can buy for your home
>>>computer, you are out of luck....  The results from this contest are
>>>not meaningful in that particular way, but they are meaningful
>>>in other ways.
>>>
>>>If you still are doubtful, we could turn this around.  Suppose that
>>>you have organized a tournament.  In your tournament all the same
>>>kinds of computers are used and all the newest commercial software
>>>is playing.  Now, I could critize your tournament as not being
>>>meaningful because it doesn't tell us what the best "artificial
>>>chess player" is.  By not including other kinds of artificial chess
>>>players and other types of hardware, I could say that your results
>>>were tainted.
>>>
>>>If I said these things about your hypothetical tournament, I would be
>>>dead wrong because I would be putting my meaning into your results
>>>rather than looking at what you were trying to do.  Your results would
>>>tell us which commercially available program is best on the hardware
>>>you selected.
>>
>>Hello Dan,
>>
>>If the programs were running on the same type of hardware, I believe that would
>>yield results which could be intelligently evaluated. If you run program A at
>>600 MHz and program B at 200 MHz, what possible intellectual conlusion could you
>>come to if program A defeated program B?
>>
>>Mel
>>>
>>> - Dan
>>>
>>>
>>>>Mel
>>>>
>>>>>TP
>
>I sent a long message about that several weeks ago. I'd recommend you to find it
>and read - there are some arguments there.
>
>BTW, would you be happy if organizers will give each participant a quad
>Xeon/550?
>
>Eugene

This is not a uniform platform event.  This is a find the best SW/HW
for this event.  None of the WCCC or WMCCC have been uniform, you can not do
an absolute program A vs program B comparison from this event.  A uniform
platform event would be nice, but all you could say is that program X won.  Not
enough games to provide any reliability that program X was the strongest
on the selected platform.  The SSDF list is the closest we have to a
uniform platform ranking.  You can compare either or both SW and HW
to get an idea of relative strengths, but this has some measure oferror
that the SSDF publishes with the ratings.  If I recall, if you add or
subtract 2 Standard Errors of Measure (SEM) then you have a 95% confidence
that the true rating (established within the population tested) will be in
the range: rating-2*SEM < true rating program X < rating+2*SEM.

Best Regards,
Chris Carson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.