Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: More on the "bad math" after an important email...Appeal to you both

Author: Uri Blass

Date: 21:27:21 09/04/02

Go up one level in this thread


On September 04, 2002 at 19:06:33, martin fierz wrote:

>On September 04, 2002 at 17:57:06, Robert Hyatt wrote:
>
>>On September 04, 2002 at 17:16:09, martin fierz wrote:
>>
>>>On September 04, 2002 at 13:06:37, Robert Hyatt wrote:
>>>
>>>>On September 04, 2002 at 11:56:29, Uri Blass wrote:
>>>>
>>>>>On September 04, 2002 at 10:25:38, Robert Hyatt wrote:
>>>>>
>>>>>>On September 04, 2002 at 02:47:20, Uri Blass wrote:
>>>>>>
>>>>>>>
>>>>>>>I here agree with GCP
>>>>>>>If Vincent's target was to convince the sponsor
>>>>>>>not to look at the speedup of crayblitz as real he probably
>>>>>>>suceeded.
>>>>>>>
>>>>>>>He does not need to prove that the results of the
>>>>>>>speed up are a lie but only to convince them
>>>>>>>not to trust the results.
>>>>>>>
>>>>>>>The times and not the speedup are the important information.
>>>>>>>
>>>>>>>Times are calculated first and speedup is calculated only
>>>>>>>later after knowing the times.
>>>>>>
>>>>>>I've said it several times, but once more won't hurt, I guess.
>>>>>>
>>>>>>The original speedup numbers came _directly_ from the log files.  Which
>>>>>>had _real_ times in them.  The nodes and times were added _way_ later.
>>>>>>Once you have a speedup for 2,4,8 and 16 processors, you can _clearly_
>>>>>>(and _correctly_) reconstruct either the time, or the nodes searched,
>>>>>>or both.  We _had_ to calculate the nodes searched for reasons already given.
>>>>>>It is possible that the times were calculated in the same way.  I didn't do
>>>>>>that personally, and without the "log eater" I can't confirm whether it was
>>>>>>done or not.
>>>>>>
>>>>>>If you don't trust the speedups, that's something you have to decide, and it
>>>>>>really doesn't matter to me since that program is no longer playing anyway.  In
>>>>>>fact, I don't have any source code for the thing as that was one of the many
>>>>>>things lost when I lost the logs and everything else.
>>>>>>
>>>>>>But, as I said, the paper was about the _performance_.  And the speedup
>>>>>>numbers were direct computations from raw data.  I consider _that_ to be
>>>>>>the important data presented in the paper, along with the description of how
>>>>>>the algorithm worked.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>Usually we tend to trust scientists but if the information
>>>>>>>about times is wrong then it means that
>>>>>>>we cannot trust the other details in the article.
>>>>>>
>>>>>>
>>>>>>
>>>>>>So if the _main_ data is correct, and is then used to calculate something
>>>>>>else, the something-else can't be trusted, and therefore neither can the
>>>>>>main data???
>>>>>>
>>>>>>Perhaps I am missing something...
>>>>>
>>>>>If the something else(times) was originally used to calculate the main data then
>>>>>there is a problem.
>>>>>
>>>>>The information that was used to calculate the main data is not less important
>>>>>than the main data and if we have not correct information about the information
>>>>>there is a problem to trust the main data(it is clear that we had wrong
>>>>>information about times).
>>>>>
>>>>>Uri
>>>>
>>>>
>>>>Uri, follow closely:
>>>>
>>>>1.  I computed the speedups by using a log eater that ate the raw search logs
>>>>and grabbed the times, and then computed those and wrote the results out in a
>>>>simple table, exactly as it appears in the article.  The speedups came right
>>>>from the raw data.
>>>>
>>>>2.  We needed (much later) to make a similar table with node counts.  We could
>>>>not directly obtain this because it wasn't in the logs, as I have explained
>>>>previously, because the tests were not run to a fixed depth, but came from a
>>>>real game where iterations were rarely finished before time ran out.  We
>>>>computed the node counts by using the one-processor node counts which we _could_
>>>>get, and then using some internal performance measures gathered during the
>>>>2,4,8 and 16 cpu runs.
>>>>
>>>>3. the time table is something I simply don't recall.  It is certainly possible
>>>>that we computed that the same way we computed the node counts, but note that
>>>>I am talking about doing step 2 and 3 several years _after_ the original test
>>>>was run and the raw speedup table was computed.
>>>
>>>bob, follow closely :-)
>>>
>>>even though you do not remember, the data in the table is *obviously* not really
>>>measured time. if you just divide the time for 1 processor by the time for n
>>>processors you see that immediately - all numbers come out as 1.7 or 1.9 or 7.3
>>>or something very close like 1.703. all 2nd digits after the . come out as 0.
>>>the probability for this happening for random data is 10 to the -24...
>>>therefore, you certainly did it for the times too.
>>
>>Note I am not disagreeing.  I simply remember having to do it for the nodes,
>>because of the problem in measuring them.  I do not remember doing it (or not
>>doing it) for the times, so as I said, it was likely done that way, but I am
>>not going to say "it absolutely was" without being sure...  Which I am not...
>
>but do you understand the argument? even if you do not remember, and even if you
>are not sure, the probablity that you did not measure these numbers is about
>0.999999999999999999999999 = 1-(10^-24). now if that is not enough for you to
>say "it absolutely was" then i don't know ;-)
>
>aloha
>  martin

I agree that the data is enough to be sure that
the times are not measure times but I have one correction
and some comments.

10^-24 is not the probability that he measured the numbers
but the probability to get always 0 in the second
digit of the division when we assume that he measured
the numbers.

I do not know to calculate the probability that he measured the
numbers because I have not apriory distribution
of believing but even if I believed in 99.999% that
he did  measure the numbers before rading the information
the results should be enough to convince me to change my mind.

More than 99.999% is too much trust for everybody.

I also have a problem with using the data to calculate probability
even with apriory distribution
because I do not have a defined test with H0 and H1.

I found something strange with probability 10^-24 but
the probability to find something strange with
probability 10^-24 may be more than 10^-24 because
there may be another
strange data that I did not think about.

On the other hand the starnge thing is not only the 0 in the second digit and
there is 0 in the 3 digit in most of the cases.

Another point is that
10^-24 is the probability only if we assume
uniform distribution.

This is a good estimate but
I guess that it is not exactly correct.

Uri



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.