Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: More on the "bad math" after an important email...Appeal to you both

Author: martin fierz

Date: 16:06:33 09/04/02

Go up one level in this thread


On September 04, 2002 at 17:57:06, Robert Hyatt wrote:

>On September 04, 2002 at 17:16:09, martin fierz wrote:
>
>>On September 04, 2002 at 13:06:37, Robert Hyatt wrote:
>>
>>>On September 04, 2002 at 11:56:29, Uri Blass wrote:
>>>
>>>>On September 04, 2002 at 10:25:38, Robert Hyatt wrote:
>>>>
>>>>>On September 04, 2002 at 02:47:20, Uri Blass wrote:
>>>>>
>>>>>>
>>>>>>I here agree with GCP
>>>>>>If Vincent's target was to convince the sponsor
>>>>>>not to look at the speedup of crayblitz as real he probably
>>>>>>suceeded.
>>>>>>
>>>>>>He does not need to prove that the results of the
>>>>>>speed up are a lie but only to convince them
>>>>>>not to trust the results.
>>>>>>
>>>>>>The times and not the speedup are the important information.
>>>>>>
>>>>>>Times are calculated first and speedup is calculated only
>>>>>>later after knowing the times.
>>>>>
>>>>>I've said it several times, but once more won't hurt, I guess.
>>>>>
>>>>>The original speedup numbers came _directly_ from the log files.  Which
>>>>>had _real_ times in them.  The nodes and times were added _way_ later.
>>>>>Once you have a speedup for 2,4,8 and 16 processors, you can _clearly_
>>>>>(and _correctly_) reconstruct either the time, or the nodes searched,
>>>>>or both.  We _had_ to calculate the nodes searched for reasons already given.
>>>>>It is possible that the times were calculated in the same way.  I didn't do
>>>>>that personally, and without the "log eater" I can't confirm whether it was
>>>>>done or not.
>>>>>
>>>>>If you don't trust the speedups, that's something you have to decide, and it
>>>>>really doesn't matter to me since that program is no longer playing anyway.  In
>>>>>fact, I don't have any source code for the thing as that was one of the many
>>>>>things lost when I lost the logs and everything else.
>>>>>
>>>>>But, as I said, the paper was about the _performance_.  And the speedup
>>>>>numbers were direct computations from raw data.  I consider _that_ to be
>>>>>the important data presented in the paper, along with the description of how
>>>>>the algorithm worked.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>Usually we tend to trust scientists but if the information
>>>>>>about times is wrong then it means that
>>>>>>we cannot trust the other details in the article.
>>>>>
>>>>>
>>>>>
>>>>>So if the _main_ data is correct, and is then used to calculate something
>>>>>else, the something-else can't be trusted, and therefore neither can the
>>>>>main data???
>>>>>
>>>>>Perhaps I am missing something...
>>>>
>>>>If the something else(times) was originally used to calculate the main data then
>>>>there is a problem.
>>>>
>>>>The information that was used to calculate the main data is not less important
>>>>than the main data and if we have not correct information about the information
>>>>there is a problem to trust the main data(it is clear that we had wrong
>>>>information about times).
>>>>
>>>>Uri
>>>
>>>
>>>Uri, follow closely:
>>>
>>>1.  I computed the speedups by using a log eater that ate the raw search logs
>>>and grabbed the times, and then computed those and wrote the results out in a
>>>simple table, exactly as it appears in the article.  The speedups came right
>>>from the raw data.
>>>
>>>2.  We needed (much later) to make a similar table with node counts.  We could
>>>not directly obtain this because it wasn't in the logs, as I have explained
>>>previously, because the tests were not run to a fixed depth, but came from a
>>>real game where iterations were rarely finished before time ran out.  We
>>>computed the node counts by using the one-processor node counts which we _could_
>>>get, and then using some internal performance measures gathered during the
>>>2,4,8 and 16 cpu runs.
>>>
>>>3. the time table is something I simply don't recall.  It is certainly possible
>>>that we computed that the same way we computed the node counts, but note that
>>>I am talking about doing step 2 and 3 several years _after_ the original test
>>>was run and the raw speedup table was computed.
>>
>>bob, follow closely :-)
>>
>>even though you do not remember, the data in the table is *obviously* not really
>>measured time. if you just divide the time for 1 processor by the time for n
>>processors you see that immediately - all numbers come out as 1.7 or 1.9 or 7.3
>>or something very close like 1.703. all 2nd digits after the . come out as 0.
>>the probability for this happening for random data is 10 to the -24...
>>therefore, you certainly did it for the times too.
>
>Note I am not disagreeing.  I simply remember having to do it for the nodes,
>because of the problem in measuring them.  I do not remember doing it (or not
>doing it) for the times, so as I said, it was likely done that way, but I am
>not going to say "it absolutely was" without being sure...  Which I am not...

but do you understand the argument? even if you do not remember, and even if you
are not sure, the probablity that you did not measure these numbers is about
0.999999999999999999999999 = 1-(10^-24). now if that is not enough for you to
say "it absolutely was" then i don't know ;-)

aloha
  martin




>>the real point is that there is *no way* you could have measured those search
>>times, and that if you were to claim you really did measure them, you would be a
>>*proven* fraud. but, as you say, you measured the speedup to 1 digit, and not
>>the real time, then it all makes sense - except that you did something you
>>shouldnt
>>really do...
>>
>>aloha
>>  martin
>
>
>The main problem was that the speedup numbers were all I originally produced.
>I had published enough times and node counts and averages of times and averages
>of node counts in my dissertation that there seemed to be "enough" of that kind
>of data already.
>
>A couple of years later we were asked to add the nodes and times.  After we
>were asked multiple times to "shorten it up".  That's all I can say...
>
>
>
>
>
>>
>>
>>>Conclusions:
>>>
>>>1.  the speedup data came directly from five large log files, run thru a
>>>program that matched up depths and moves and grabbed the time for the one
>>>that was of interest (the last move displayed in the real game).  This data
>>>I have 100% confidence in as representing actual raw data.
>>>
>>>2.  The times/nodes I am not sure about.  They were produced either in 1996
>>>or early 1997.  According to annual faculty activy reports here, I started
>>>working on this paper in 1993, and submitted it late in 1994.  We haggled over
>>>various things for about two years, back and forth.  It was actually published
>>>in the March 1997 JICCA.  The key is that the speedup data was produced in
>>>late 1993 and early 1994, right after the 1993 ACM event where the game in
>>>question was played.  The paper was finished a couple of years later.  It is
>>>certainly possible that this happened after I lost all files here so that we
>>>had to extrapolate times based on the rather simple data I have in my paper
>>>files.
>>>
>>>The actual data I have here, is the 24 positions, the time taken on the 16
>>>processor test, and then the printout of the raw speedup table that is in
>>>the JICCA.  So it is certainly possible that the times _and_ nodes were
>>>extrapolated.  It is possible that it was done because it was easier than trying
>>>to round up the old logs and produce them by eating those.  It is possible it
>>>was done because the logs were lost.  I have tried to remember when the disk
>>>crash happened here...  and I will probably probe DejaNews as when it happened,
>>>I immediately sent out an appeal for any old crafty versions since they were all
>>>lost, excepting the ones on my ftp machine (a different box).  That might help
>>>in remembering more.  But doing this paper was mainly an effort fro 1993-1994.
>>>The later additions, such as the two additional tables, more explanations about
>>>some parts, less info about others, was done over the next 2-3 years, and it
>>>was done _very_ sporadically.  That's why I don't remember a lot of specific
>>>details, it was spread over a long time, with a lot of other things going on,
>>>and didn't seem very important when we were doing it.  Had I thought "Hey I
>>>am really cheating the world here." I would have at least remembered that.  But
>>>the extrapolation seemed quite accurate as we ran a few positions and
>>>extrapolated and compared that to real positions, just to be sure the
>>>extrapolations were reasonable.  They were, and we never gave the node issue
>>>another moment's thought.  Until now, of course..



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.