Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: More on the "bad math" after an important email...Appeal to you both

Author: martin fierz

Date: 14:16:09 09/04/02

Go up one level in this thread


On September 04, 2002 at 13:06:37, Robert Hyatt wrote:

>On September 04, 2002 at 11:56:29, Uri Blass wrote:
>
>>On September 04, 2002 at 10:25:38, Robert Hyatt wrote:
>>
>>>On September 04, 2002 at 02:47:20, Uri Blass wrote:
>>>
>>>>
>>>>I here agree with GCP
>>>>If Vincent's target was to convince the sponsor
>>>>not to look at the speedup of crayblitz as real he probably
>>>>suceeded.
>>>>
>>>>He does not need to prove that the results of the
>>>>speed up are a lie but only to convince them
>>>>not to trust the results.
>>>>
>>>>The times and not the speedup are the important information.
>>>>
>>>>Times are calculated first and speedup is calculated only
>>>>later after knowing the times.
>>>
>>>I've said it several times, but once more won't hurt, I guess.
>>>
>>>The original speedup numbers came _directly_ from the log files.  Which
>>>had _real_ times in them.  The nodes and times were added _way_ later.
>>>Once you have a speedup for 2,4,8 and 16 processors, you can _clearly_
>>>(and _correctly_) reconstruct either the time, or the nodes searched,
>>>or both.  We _had_ to calculate the nodes searched for reasons already given.
>>>It is possible that the times were calculated in the same way.  I didn't do
>>>that personally, and without the "log eater" I can't confirm whether it was
>>>done or not.
>>>
>>>If you don't trust the speedups, that's something you have to decide, and it
>>>really doesn't matter to me since that program is no longer playing anyway.  In
>>>fact, I don't have any source code for the thing as that was one of the many
>>>things lost when I lost the logs and everything else.
>>>
>>>But, as I said, the paper was about the _performance_.  And the speedup
>>>numbers were direct computations from raw data.  I consider _that_ to be
>>>the important data presented in the paper, along with the description of how
>>>the algorithm worked.
>>>
>>>
>>>
>>>>
>>>>Usually we tend to trust scientists but if the information
>>>>about times is wrong then it means that
>>>>we cannot trust the other details in the article.
>>>
>>>
>>>
>>>So if the _main_ data is correct, and is then used to calculate something
>>>else, the something-else can't be trusted, and therefore neither can the
>>>main data???
>>>
>>>Perhaps I am missing something...
>>
>>If the something else(times) was originally used to calculate the main data then
>>there is a problem.
>>
>>The information that was used to calculate the main data is not less important
>>than the main data and if we have not correct information about the information
>>there is a problem to trust the main data(it is clear that we had wrong
>>information about times).
>>
>>Uri
>
>
>Uri, follow closely:
>
>1.  I computed the speedups by using a log eater that ate the raw search logs
>and grabbed the times, and then computed those and wrote the results out in a
>simple table, exactly as it appears in the article.  The speedups came right
>from the raw data.
>
>2.  We needed (much later) to make a similar table with node counts.  We could
>not directly obtain this because it wasn't in the logs, as I have explained
>previously, because the tests were not run to a fixed depth, but came from a
>real game where iterations were rarely finished before time ran out.  We
>computed the node counts by using the one-processor node counts which we _could_
>get, and then using some internal performance measures gathered during the
>2,4,8 and 16 cpu runs.
>
>3. the time table is something I simply don't recall.  It is certainly possible
>that we computed that the same way we computed the node counts, but note that
>I am talking about doing step 2 and 3 several years _after_ the original test
>was run and the raw speedup table was computed.

bob, follow closely :-)

even though you do not remember, the data in the table is *obviously* not really
measured time. if you just divide the time for 1 processor by the time for n
processors you see that immediately - all numbers come out as 1.7 or 1.9 or 7.3
or something very close like 1.703. all 2nd digits after the . come out as 0.
the probability for this happening for random data is 10 to the -24...
therefore, you certainly did it for the times too.

the real point is that there is *no way* you could have measured those search
times, and that if you were to claim you really did measure them, you would be a
*proven* fraud. but, as you say, you measured the speedup to 1 digit, and not
the real time, then it all makes sense - except that you did something you
shouldnt
really do...

aloha
  martin


>Conclusions:
>
>1.  the speedup data came directly from five large log files, run thru a
>program that matched up depths and moves and grabbed the time for the one
>that was of interest (the last move displayed in the real game).  This data
>I have 100% confidence in as representing actual raw data.
>
>2.  The times/nodes I am not sure about.  They were produced either in 1996
>or early 1997.  According to annual faculty activy reports here, I started
>working on this paper in 1993, and submitted it late in 1994.  We haggled over
>various things for about two years, back and forth.  It was actually published
>in the March 1997 JICCA.  The key is that the speedup data was produced in
>late 1993 and early 1994, right after the 1993 ACM event where the game in
>question was played.  The paper was finished a couple of years later.  It is
>certainly possible that this happened after I lost all files here so that we
>had to extrapolate times based on the rather simple data I have in my paper
>files.
>
>The actual data I have here, is the 24 positions, the time taken on the 16
>processor test, and then the printout of the raw speedup table that is in
>the JICCA.  So it is certainly possible that the times _and_ nodes were
>extrapolated.  It is possible that it was done because it was easier than trying
>to round up the old logs and produce them by eating those.  It is possible it
>was done because the logs were lost.  I have tried to remember when the disk
>crash happened here...  and I will probably probe DejaNews as when it happened,
>I immediately sent out an appeal for any old crafty versions since they were all
>lost, excepting the ones on my ftp machine (a different box).  That might help
>in remembering more.  But doing this paper was mainly an effort fro 1993-1994.
>The later additions, such as the two additional tables, more explanations about
>some parts, less info about others, was done over the next 2-3 years, and it
>was done _very_ sporadically.  That's why I don't remember a lot of specific
>details, it was spread over a long time, with a lot of other things going on,
>and didn't seem very important when we were doing it.  Had I thought "Hey I
>am really cheating the world here." I would have at least remembered that.  But
>the extrapolation seemed quite accurate as we ran a few positions and
>extrapolated and compared that to real positions, just to be sure the
>extrapolations were reasonable.  They were, and we never gave the node issue
>another moment's thought.  Until now, of course..



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.