Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hydra node speed from CSS forum

Author: Vasik Rajlich

Date: 03:43:40 08/31/04

Go up one level in this thread


On August 30, 2004 at 19:43:15, Mark Young wrote:

>On August 30, 2004 at 19:22:16, James T. Walker wrote:
>
>>On August 30, 2004 at 16:39:22, Mark Young wrote:
>>
>>>On August 30, 2004 at 15:33:19, Robert Hyatt wrote:
>>>
>>>>On August 30, 2004 at 14:51:01, Uri Blass wrote:
>>>>
>>>>>On August 30, 2004 at 13:51:48, Robert Hyatt wrote:
>>>>>
>>>>>>On August 30, 2004 at 12:24:54, Volker Böhm wrote:
>>>>>>
>>>>>>>On August 30, 2004 at 10:02:54, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On August 30, 2004 at 08:30:34, Kurt Utzinger wrote:
>>>>>>>>
>>>>>>>>>On August 30, 2004 at 08:12:52, Jouni Uski wrote:
>>>>>>>>>
>>>>>>>>>>Eine FPGA-Karte untersucht momentan ca. 3 Millionen Positionen/Sekunde. 16
>>>>>>>>>>Karten machen daher theoretisch 48 MPos/sec. (Donninger)
>>>>>>>>>>
>>>>>>>>>>Jouni
>>>>>>>>>
>>>>>>>>>      If Hydra made 48 Mpos/sec this again proves (in comparison
>>>>>>>>>      with the 2 Mpos/sec on Quad-Opteron server with 4 CPU's of
>>>>>>>>>      Shredder) that the number of pos/sec can't be taken as a
>>>>>>>>>      reliable value for the goodness of a chess program. It's
>>>>>>>>>      of course simply impossible to compare apples and organes.
>>>>>>>>>      Kurt [http://www.utzingerk.com]
>>>>>>>>
>>>>>>>>
>>>>>>>>Don't forget that Hydra ripped Shredder's head off.  So the NPS _might_ be
>>>>>>>>significant here...
>>>>>>>
>>>>>>>Didn´t I´ve heard you saying that 10 games are not enough to draw a
>>>>>>>statistically significant conclusion on the playing strength?
>>>>>>>
>>>>>>>Greetings Volker
>>>>>>
>>>>>>
>>>>>>With two _close_ opponents, correct.  But if one is seriously stronger, as hydra
>>>>>>appeared to be, 10 games is plenty.
>>>>>
>>>>>We do not know if hydra is seriously stronger.
>>>>
>>>>We have a pretty good clue that it is.  It is over 10x faster, potentially, than
>>>>other programs.
>>>>
>>>>1. I first assume that the programmer / designer is no dummy.
>>>>
>>>>2.  all else being "equal" 10x faster is a _serious_ advantage.
>>>>
>>>>3.  the above two points translate into a signficant strength advantage.
>>>>
>>>>
>>>>>
>>>>>You cannot start by assuming that hydra is significantly stronger when this is
>>>>>the question.
>>>>
>>>>With evidence, you can.  IE I can certainly assume that Crafty on an 8-way
>>>>opteron is significantly stronger than Crafty on my dual xeon.
>>>>
>>>>
>>>>>
>>>>>If you see 10-0 you can say based on the result that Hydra is significantly
>>>>>stronger but when you see 5.5-2.5 you cannot claim it based on the result and
>>>>>you only can say that you do not know if it is significantly stronger based on
>>>>>the result.
>>>>>
>>>>
>>>>
>>>>If you only look at the results, maybe or maybe not.  But I watched many of the
>>>>games with Crafty analyzing.  That tells you even more.
>>>
>>>Common sense should tell us that Hydra is stronger. It should have a big
>>>hardware advantage. I think this is your point, and I agree. But I still need
>>>more data to be sure. Right now there is only a 1 in 6 chance that Hydra is the
>>>stronger program based on the games alone.
>>
>>***************
>>Where do you get a 1 in 6 chance that Hydra is stronger???  If that's true then
>>what are the odds that Shredder is stronger??  Maybe 5 in 6 ???
>>****************
>>
>
>You must understand.... The odds are so low because not many games were played.
>Now if Hydra would have won 8-0 or 7-1 then you have a much higher certainty. A
>8-0 score shows  (100>) to 1 against sherdder being stronger. A score of 7-1
>would have a 20 to 1 against Shredder being stronger. Now when you have a match
>score of 5.5 - 2.5.  It tells you almost nothing. It is close to being 50 - 50
>odds or to be exact 65 to 45 that Hydra is stronger.
>

These "mathematical" tables are so abused that it's probably better that they
didn't exist.

For example:

A 5.5-2.5 self-play result, with two similar versions no more than five rating
points apart, does not allow you to draw a statistically significant conclusion.
OTOH, if you know that two programs differ in strength by 200 points, then a
5.5-2.5 will allow you to identify the stronger side with near-100% certainty.

If you insist on having a table with only a match score as input and a
single-value statistical significance as output, you can expect nonsense.

Vas

>
>>>
>>>>
>>>>
>>>>
>>>>>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.