Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Let's talk about fraud.

Author: Robert Hyatt

Date: 08:42:19 05/04/04

Go up one level in this thread


On May 04, 2004 at 07:11:15, martin fierz wrote:

>On May 03, 2004 at 22:50:58, Robert Hyatt wrote:
>
>>On May 03, 2004 at 19:04:34, martin fierz wrote:
>>
>>>On May 03, 2004 at 11:51:24, Gian-Carlo Pascutto wrote:
>>>
>>>>On May 03, 2004 at 11:04:59, Anthony Cozzie wrote:
>>>>
>>>>>As a physicist, you consider all numbers within an order of magnitude as equal
>>>>>;)
>>>>
>>>>Then you are not a physicist, but an engineer :)
>>>
>>>not at all - engineers care about exact numbers. else everything fails (e.g. all
>>>kinds of mars probes, ariane rockets, bridges, buildings and much much more,
>>>because exact numbers ARE important in engineering).
>>>
>>>
>>>>As a physicist, you care first and foremost about the error analysis of
>>>>the results (which immediately allows you to conclude whether they are
>>>>identical or not).
>>>
>>>that's not what physics is about. error analysis is important for sure, but
>>>never "first and foremost".
>>>
>>>>Ever seen any error margins in a computer chess paper?
>>>
>>>in fact yes - ernst heinz used to do stuff on statistical significance of some
>>>sort, IIRC it was whether you could conclude that one engine was stronger than
>>>another based on tournament results and rating computations. also, IIRC, his
>>>statistics were wrong :-) (IIRC he didn't seem to appreciate that if you have
>>>A+dA and B+dB, then the difference A-B does NOT have the error dA+dB). lots of
>>>IIRCs here, an old man's memory can easily be wrong...
>>>
>>>but in this context it would be interesting to know whether the number reported
>>>by bob (3.1) and those others floating around (3.0, 2.8) have any kind of error
>>>estimate. don't really understand who exactly floats those other numbers
>>>(vincent? you? both of you? anybody else?), don't really care.
>>>generally, if you give a number as %.1f educated people will assume that it has
>>>at least an error of +-0.1, making the numbers 3.1 and 3.0 compatible. and
>>>making the numbers 3.1 and 2.8 nearly compatible, if you think of 0.1 as
>>>one-sigma. it's you and bob who gave those numbers, it would be nice if you guys
>>>also gave an error estimate on those numbers, because if you are going to say
>>>0.1, we can just drop the entire thread.
>>
>>If you recall, I _have_ given some error estimates in the past.   Remember the
>>wildly varying speedup numbers I showed you the first time this issue came up?
>
>i recall that you gave wildly varying speedup numbers, and an explanation for
>why this happens. i  don't recall a real error estimate, but that can be either
>because
>-> you gave one and i didn't see it
>-> you gave one, i saw it and forgot
>-> you didn't give one at all
>
>so... what kind of numbers would you give if you were pressed?


In general, I wouldn't.

For the same reason I avoid giving specific speedup numbers, unless I quote for
a specific set of test positions.  Some positions will produce the same speedup
(within a factor of .1) over and over and over.  Some will vary by a speedup
factor of 2 (or more) over and over.  IE the only accurate number anyone could
quote might look like this:

with 4 cpus, the speedup varies from 1.0 (none) to 6.2, with an average of 2.7.

There isn't a lot of info there.  That is why I usually try to present the
speedup for a test set in two parts.  First, position by position, and then
aggregate for the entire test set.  The problem is that this is all very
non-deterministic.  Some of these positions will produce a different number
every time, making it very difficult for us to compare numbers for the same
program, on the same hardware, on the same problem set.  Much less varying one
of those.

I'm going to eventually write a paper on Crafty's parallel search.  I'll try to
include that very analysis, since I always run each test multiple times anyway.
I'll add a "standard deviation" column for both position by position and for the
entire test set.  That will give what you are asking for.  Noting that it will
be valid for _that_ test set only, not an "in general" case...


>
>cheers
>  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.