Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: "False data"

Author: Robert Hyatt
Date: 10:28:21 01/06/04
On January 06, 2004 at 12:21:46, Uri Blass wrote:

>On January 06, 2004 at 11:08:25, Robert Hyatt wrote:
>
>>On January 05, 2004 at 19:33:05, Uri Blass wrote:
>>
>>>On January 05, 2004 at 19:05:59, Rolf Tueschen wrote:
>>>
>>>>On January 05, 2004 at 18:51:10, Uri Blass wrote:
>>>>
>>>>>On January 05, 2004 at 18:30:32, Rolf Tueschen wrote:
>>>>>
>>>>>>On January 05, 2004 at 18:18:57, Anthony Cozzie wrote:
>>>>>>
>>>>>>>On January 05, 2004 at 13:52:39, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On January 05, 2004 at 11:07:03, Vincent Diepeveen wrote:
>>>>>>>>
>>>>>>>>>On January 04, 2004 at 00:43:30, Ed Trice wrote:
>>>>>>>>>
>>>>>>>>>Hi Ed,
>>>>>>>>>
>>>>>>>>>It was my intention to stop posting in the amateur forum,
>>>>>>>>
>>>>>>>>
>>>>>>>>Why don't you take your "non-amateur" stuff back to the forum for
>>>>>>>>the "world's foremost authority on everything" (which has only one
>>>>>>>>member of course, so you _never_ have to defend anything you post
>>>>>>>>there) and leave the rest of us alone?
>>>>>>>>
>>>>>>>>your "air of superiority" is sickening, IMHO.
>>>>>>>>
>>>>>>>>BTW, exactly how many copies of your program have you sold, to qualify you
>>>>>>>>to be "non-amateur"???
>>>>>>>
>>>>>>>This is quite clearly an amateur forum.  The vast majority of the members here,
>>>>>>>including you and me, are not paid to write chess programs.
>>>>>>>
>>>>>>>I know you and Vincent don't get along, but you seem to be able to take offense
>>>>>>>at the mildest things when he writes them . . .
>>>>>>>
>>>>>>>anthony
>>>>>>
>>>>>> Excuse me if I contradict. IMO Bob Hyatt reacted on Vincents vocabulary with
>>>>>>the maximum possible friendliness as academic. I fear you underestimate the
>>>>>>nonsense V. is writing from time to time. Others would stop all communication
>>>>>>with such correspondent. In Vincent's case Bob tried to be an elderly critic
>>>>>>full of mild irony. While V. goes into crass verbal de-regulations. But the
>>>>>>limit is if you accuse unjustified a scientist of fraud. A scientist without
>>>>>>commercial interest in computerchess. Somewhere there must be a limit!
>>>>>>
>>>>>>You can criticise all you want and a normal scientist will be happy to have a
>>>>>>dispute with you. But somehow you must also show some respect for the academic
>>>>>>education. Look, the critic of Hyatt and yours truly against the TD board in
>>>>>>Graz is academically sound because it's logically based on the rules and
>>>>>>reality. Vincent however has no case at all and he still is talking about
>>>>>>'fraud'.
>>>>>>
>>>>>>Rolf
>>>>>
>>>>>Note that there were other people who criticized that article including me but
>>>>>saying that some data is wrong and even saying that we cannot trust one article
>>>>>of Hyatt is different than blaming him like Vincent did.
>>>>
>>>>This is the first argument and the second, considering your own critic above,
>>>>please read http://www.talkchess.com/forums/1/message.html?340359
>>>>and then say what you mean Bob did wrong.
>>>>
>>>>1. The original data are ok
>>>>
>>>>2. There were interpolations; there might be something inexact
>>>>
>>>>I think we must differentiate between these two cases. If you simply speak of
>>>>"data" this could be confusional. The interpolations might be faulty but NOT the
>>>>original data. That is at least what Bob is saying IMO. I remember we had also a
>>>>debate how such a thing could happen but Bob explained how this could well
>>>>happen during the process of the publication. It was certainly not a fraud or
>>>>something next to it. It is strange that Vincent has misunderstood it.
>>>>
>>>>Rolf
>>>
>>>We certainly cannot claim that we are sure that it was a fraud but the fact that
>>>the interpolations were not mentioned in the publication give a reason to have
>>>doubts about trusting the article.
>>
>>You can trust what you want.  I'll briefly recap _ONE MORE TIME_.  The
>>original data used to compute the speedup numbers was derived directly from
>>the log files.  The speedups were computed to the nearest tenth (xx.x) for
>>no good reason other than going more accurate is pointless when even the .x
>>part varies significantly from run to run.
>>
>>I wrote the paper based solely on the raw times and resulting speedup numbers.
>>Later I was asked to supply the node counts, and they were simply not available.
>>It only took about 2 years from start to get this written and published, and
>>the raw data was lost somewhere in 1995 (approximately).  I simply computed the
>>nodes based on the speedup and time and NPS CB produced.
>>
>>I _really_ don't care whether anyone trusts the node counts or not.  They are
>>_meaningless_ to anyone, except as a method used to explain why the speedup
>>is not 16X on a 16 cpu machine.  CB searched at roughly 16X the NPS when using
>>16 cpus, but it almost never ran 16x faster.  The nodes would climb (this is
>>called search overhead) to make the parallel search do more work than the
>>serial search.
>>
>>To understand this, why don't you take Crafty, I will supply you a couple of
>>logs for 1, 2 and 4 cpus.  you take the raw search time to a specific depth for
>>each move and record it.  Compute the speedup.  Then take the rough NPS number
>>and see if you can compute the nodes searched by multiplying the nps by the
>>time used.  That's all I did in CB, and the numbers are _very_ accurate.  The
>>only thing that Vincent seems to have a problem with is that the nodes reported
>>in the paper is +exactly+ proportional to the times reported, because they were
>>derived from them.
>>
>>I'll leave it to you to do the computation and see whether or not you like the
>>numbers.
>>
>>Here is just one sample.  I can send you a log if you want.
>>
>>Note that this is run on my dual, so I would hope for speeds about 2x faster
>>even though it reports 4 cpus (this is hyper-threading).
>>
>>log.001:              time=27.48  cpu=99%  mat=0  n=28197633  fh=92%  nps=1.03M
>>log.002:              time=13.50  cpu=387%  mat=0  n=28512307  fh=92%  nps=2.11M
>>
>>Now, the above is for a fixed search depth.
>>
>>real data:
>>
>>1cpu time=27.48  4cpu time=13.50  speedup=2.0,
>>
>>actual 4cpu nodes 28412307.
>>computed 4cpu nodes 28350000.
>>
>>You be the judge of how "fake" that last number is.  The only problem is that
>>if you divide the first nodes by time, you get some number, while if you divide
>>the computed nodes by time, you get _exactly_ the reported NPS.  That is what
>>Vincent went south about.  My data was simply off if you wanted to compute
>>more than one decimel place, because it was derived from two numbers, one of
>>which was accurate to only _one_ decimel place.
>>
>>If you believe that is "faking" then more power to you.
>>
>>As for what Vincent believes, I may one day post an email or two here that
>>_really_ explains his problem.  And it will _really_ show his moral standards
>>for reporting results.
>>
>>He wrote me once saying that he was trying to convince some agency to give him
>>time, and he _knew_ he couldn't produce the kinds of speedups I did on the Cray.
>>He said he saw two choices.  (1) discredit my results;  (2) explain why current
>>programs can't produce decent speedups, and he wanted to blame this on
>>null-move.
>>
>>I pointed out that null-move did _not_ make a significant difference, and I
>>ran tests for him to show this, even though he was claiming it everywhere.  I
>>also pointed out that a NUMA machine would _never_ approach the performance of
>>a pure SMP machine, but he simply could not grasp that idea and it went nowhere.
>>So, since he couldn't figure out a way to justify his poor results, which were
>>mainly a result of a poor architecture, he chose to try to discredit results
>>that were better than his.  And he talks about _me_ trying to commit academic
>>fraud.
>>
>>He _really_ needs to look in the mirror.
>>
>>The emails I saved are much more revealing of his true character, as he was
>>clearly intending to use that machine, period.  His main goal was not to do
>>good science, but to do whatever it took to impress his "sponsor".
>>
>>When you think about it, that is _not_ the way to do research.
>>
>>>
>>>Hyatt gave an explanation but the problem is that the explanation was given too
>>>late and not at the time of the publication.
>>
>>As I said, do the above computations, _then_ decide whether the node numbers
>>are wrong enough to even consider.  IE my speedup numbers are rounded to the
>>nearest tenth.  They _could_ have been published to 9 decimel places.  Would
>>that have enhanced anything?  The node numbers _could_ have been rounded to
>>the nearest one hundred thousand.  Would _that_ have made them wrong?  Would
>>it have made _any_ difference to the paper, which didn't even discuss the
>>node counts specifically?
>>
>>First do the math, _then_ decide what is significant and what is not.
>>
>>I've already done that.  And if you do the computation above, you might
>>have a different opinion.
>>
>>
>>
>>>
>>>I usually believe that data is correct but if Bob Hyatt remembers to give more
>>>information only after people find mistakes then we can wonder and suspect that
>>>some more information is hidden and it is a reason to have doubts about the
>>>article.
>>
>>
>>
>>Fine, then simply ignore it.  If you believe _that_.  Then you probably would
>>think that the speedup numbers are wrong, simply because they were rounded and
>>those 1/1000ths are important.  I, however, _know_ that even the 1/10ths are
>>meaningless in parallel speedups.
>>
>>
>>
>>>
>>>Note that I do not claim that data that is calculated based on interpolation is
>>>a mistake, but not mentioning it in time is a mistake.
>>>
>>>Uri
>>
>>
>>Why don't you re-read the article.  And notice that the node counts are not
>>mentioned in the paper.  Why?  Because they were not _in_ the paper's original
>>contents.  They are _meaningless_ to the context of the paper, and the only
>>thing they show is that a 2-cpu search usually searches a larger tree than a
>>1-cpu search.  Without the node values, you could make either of the two
>>assumptions and they _could be correct:
>>
>>(1) the speed-up was less than optimal because processors were all the time
>>busy waiting on each other and not doing useful work;
>>
>>(2) the speed-up was less than optimal because processors were busy searching
>>all the time, but in the parallel search they searched a larger tree than the
>>serial search.
>>
>>_that_ was why the request for node counts was originally made.  And the
>>numbers given are _perfect_ with respect to showing that (2) was the case in
>>Cray Blitz (and it is also the case for Crafty and any other parallel search
>>program I have seen, until you get to NUMA where (2) is _still_ a major factor,
>>but suddenly (1) becomes measurable also due to memory latency issues.
>>
>>So the node numbers were requested even though the paper _clearly_ states that
>>in CB, _all_ cpus search _all_ the time.  A CPU _never_ sits idle waiting on
>>something to do for more than a few milliseconds out of 4-5 minutes total time.
>>You could take the node numbers out of the paper, and it would still tell the
>>_same_ story.  _perfectly_.  And in the review process, only _one_ reviewer
>>even wanted me to go back and add the node numbers.  The point being that
>>the node counts were not needed in the context of the paper as written, they
>>were simply requested for conformity with what had been published by _other_
>>authors on various parallel search numbers.
>>
>>Had Vincent bothered to look at my dissertation, which I pointed him to
>>multiple times, he would have seen _real_ numbers from front to back, and
>>in the case of those numbers, I still have the original printed log files in
>>my office, so they were not lost.  Of course, he wouldn't let a little data
>>get in the way of his quest to get access to a big machine, so that really
>>didn't matter much.
>>
>>But, form your _own_ opinion.  If you think I fake data, then by all means
>>ignore anything further I write/post here.  If you think (as I do) that maybe
>>the real fake is Vincent here (SOS kills Crafty, any debugged program kills
>>crafty, his parallel speedup is always > 2 for 2 cpus, etc) since his results
>>are _never_ reproduced by anyone.  I produced a test run that had 4 processors
>>running 3.1x faster.  He ran a test where he got _no_ speedup using Crafty.  I
>>ran his test positions and got a speedup of 3.0x.  He said "aha, your 3.1x
>>number is phony, you only got 3.0x."  Of course, he ignored that _he_ had
>>reported that I got 1.01x or some such nonsense, something that _nobody_ has
>>ever repeated.  There is "faking" and there is "FAKING".  I don't consider
>>extrapolation to be "faking" at all.  Of course it _should_ have been mentioned
>>in the paper.  But it was added after the fact, and during that part of the
>>review process, we were actively _reducing_ the size of the paper, not thinking
>>about _adding_ more text, because of requests of the editors to "keep it as
>>short as possible."
>>
>>I'm not going to keep repeating this explanation.  If we use the term "fraud"
>>it ought to be associated with someone that does it _all_ the time (Vincent)
>>rather than me.  I don't consider that paper "fraud" at all.  I considered the
>>node counts so unimportant that I didn't even remember how we had done 'em until
>>after the issue was raised.  It was _that_ unimportant to me, since the numbers
>>were _that_ unimportant to the paper as written.
>
>I agree that Vincent is saying nonsense.
>I understand your explanation.
>
>I did not claim that there was a fake but only that the fact that the
>extrapolation was not mentioned at that time is a problem

I actually believe that it _was_ mentioned.  however, As Jaap and I (and
the reviewers) were looking at the thing, we removed this, removed that,
dumped this paragraph, etc.  And, to the best of my recollection, at some
point one of us deleted the paragraph with that single sentence.  I don't
have all the intermediate revisions as that was a painful process with Jaap
sending me MSword documents, I would get someone to print them, and then
send back revisions.  It was ugly as at the time, his version of MSword and
whatever I had on Linux at the time were simply incompatible.  Today I can
work on MSword documents just fine with OpenOffice, but not back then.

>
>I remember that
>GCP said because maybe other things were not mentioned and he cannot give a
>scientific value for the article.

That's up to him.  I notice that both he _and_ Vincent had enough questions
about the parallel search stuff and they certainly had no problems in asking
me to explain everything in great detail..

Again, the nodes were an "after-thought" by someone.  You could take that
table out, and the paper would be no worse.  In fact, it would be a good bit
shorter, but one referee wanted them for comparison to other papers on the
same subject previously published.  I had enough trouble with them about not
using the Kopec/Bratko positions.  :)




>
>I considered the fact that extrapolation was done as important because I
>remember that the impression based on reading the data was that the speed
>improvement of Cray blitz was almost linear in the time of the procesors and
>there is a difference if it is 1.99 times faster with 2 processors or 1.9 times
>faster.

I will say this again.  Times _vary_.  I have published _many_ examples here
so why you would want .01 accuracy on a number that is hardly accurate to .1
is beyond me.  I really don't even like the .1 speedup numbers, but .1 is
within some reasonable error bound.  But forget about .01 or .001.




>
>If you want to compare between the speed improvement of cray blitz and the speed
>improvement of other chess programs then it is important to know if you are 1.99
>times faster or 1.9 times faster.


Not really.  the proper test is if abs(CB-newprog) < delta where delta is
some small number, and .1 is the _smallest_ value I would use.  IE most in
parallel search would say a speedup of 2.0 vs 1.8 is "very close".  What was
more interesting was _not_ the 1.9 vs 2.0 rounding, but what happened with
16 processors, which clearly showed that things were "getting worse" and not
"better".  However, there were some things left to be tried that I never did
for CB in the test I published.  One is the trick in current Crafty that lets
the operator say "only use 4 (or N) cpus at a specific split point...  That
prevents too much "inter-processor chatter" and too many splits where there are
not even 16 moves to search in parallel.  Crafty does that although on 2-4
cpu systems I have not used it.

But for heaven's sake, do _not_ pay attention to the fractional part of
speedup numbers.  Maybe between 1 and 2 cpus, the number _might_ have some
very minor utility, but with 4, even .1 is misleading, and beyond 4, whole
numbers are _more_ than enough, as that hides a lot of the "jitter".  If
you want to look at .1 accuracy, you _must_ run a bunch of tests, and then you
need to do some statistical analysis to show the standard deviation and
variance to put that .1 accuracy into context.  IE .1 with a variance of
.7 means that .1 is _not_ very informative.  :)



>
>I understand that comparing between Numa machine and cray blitz is wrong
>but comparison between cray blitz and other machines can be still relevant.

Probably not.  There will likely not ever be any more SMP machines with 16 and
32 processors that are _true_ SMP with respect to memory.  That means that the
Cray will _always_ have a significant performance advantage.  Today, I don't
know of any 32 processor box you can buy that is not NUMA, except for the
Cray T90, for example.

I tried to explain that to Vincent more than once, in fact.  But important
details slip right by when he is in "Hyper-Vincent mode".

So comparing to Cray Blitz is not very useful today, and I really don't spend
much time thinking about Crafty vs CB parallel stuff in fact, as the machines
are simply way too different to compare reasonably.  however, if we all run
on 16-way opterons, then that will let us compare parallel algorithms.  But
from a science point of view, you _never_ want two degrees of freedom in
an experiment (IE different hardware _and_ different software) as that makes
it very difficult to attribute performance to one or the other.  There are
ways to statistically analyze such data, but it requires _much_ more raw
data (hard to produce on machines like the C90/T90) and it is not very easy
to understand the results...

>
>Uri
Re: "False data" Anthony Cozzie 10:36:19 01/06/04
- Re: "False data" Robert Hyatt 11:00:50 01/06/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.