Author: Robert Hyatt
Date: 08:08:25 01/06/04
Go up one level in this thread
On January 05, 2004 at 19:33:05, Uri Blass wrote: >On January 05, 2004 at 19:05:59, Rolf Tueschen wrote: > >>On January 05, 2004 at 18:51:10, Uri Blass wrote: >> >>>On January 05, 2004 at 18:30:32, Rolf Tueschen wrote: >>> >>>>On January 05, 2004 at 18:18:57, Anthony Cozzie wrote: >>>> >>>>>On January 05, 2004 at 13:52:39, Robert Hyatt wrote: >>>>> >>>>>>On January 05, 2004 at 11:07:03, Vincent Diepeveen wrote: >>>>>> >>>>>>>On January 04, 2004 at 00:43:30, Ed Trice wrote: >>>>>>> >>>>>>>Hi Ed, >>>>>>> >>>>>>>It was my intention to stop posting in the amateur forum, >>>>>> >>>>>> >>>>>>Why don't you take your "non-amateur" stuff back to the forum for >>>>>>the "world's foremost authority on everything" (which has only one >>>>>>member of course, so you _never_ have to defend anything you post >>>>>>there) and leave the rest of us alone? >>>>>> >>>>>>your "air of superiority" is sickening, IMHO. >>>>>> >>>>>>BTW, exactly how many copies of your program have you sold, to qualify you >>>>>>to be "non-amateur"??? >>>>> >>>>>This is quite clearly an amateur forum. The vast majority of the members here, >>>>>including you and me, are not paid to write chess programs. >>>>> >>>>>I know you and Vincent don't get along, but you seem to be able to take offense >>>>>at the mildest things when he writes them . . . >>>>> >>>>>anthony >>>> >>>> Excuse me if I contradict. IMO Bob Hyatt reacted on Vincents vocabulary with >>>>the maximum possible friendliness as academic. I fear you underestimate the >>>>nonsense V. is writing from time to time. Others would stop all communication >>>>with such correspondent. In Vincent's case Bob tried to be an elderly critic >>>>full of mild irony. While V. goes into crass verbal de-regulations. But the >>>>limit is if you accuse unjustified a scientist of fraud. A scientist without >>>>commercial interest in computerchess. Somewhere there must be a limit! >>>> >>>>You can criticise all you want and a normal scientist will be happy to have a >>>>dispute with you. But somehow you must also show some respect for the academic >>>>education. Look, the critic of Hyatt and yours truly against the TD board in >>>>Graz is academically sound because it's logically based on the rules and >>>>reality. Vincent however has no case at all and he still is talking about >>>>'fraud'. >>>> >>>>Rolf >>> >>>Note that there were other people who criticized that article including me but >>>saying that some data is wrong and even saying that we cannot trust one article >>>of Hyatt is different than blaming him like Vincent did. >> >>This is the first argument and the second, considering your own critic above, >>please read http://www.talkchess.com/forums/1/message.html?340359 >>and then say what you mean Bob did wrong. >> >>1. The original data are ok >> >>2. There were interpolations; there might be something inexact >> >>I think we must differentiate between these two cases. If you simply speak of >>"data" this could be confusional. The interpolations might be faulty but NOT the >>original data. That is at least what Bob is saying IMO. I remember we had also a >>debate how such a thing could happen but Bob explained how this could well >>happen during the process of the publication. It was certainly not a fraud or >>something next to it. It is strange that Vincent has misunderstood it. >> >>Rolf > >We certainly cannot claim that we are sure that it was a fraud but the fact that >the interpolations were not mentioned in the publication give a reason to have >doubts about trusting the article. You can trust what you want. I'll briefly recap _ONE MORE TIME_. The original data used to compute the speedup numbers was derived directly from the log files. The speedups were computed to the nearest tenth (xx.x) for no good reason other than going more accurate is pointless when even the .x part varies significantly from run to run. I wrote the paper based solely on the raw times and resulting speedup numbers. Later I was asked to supply the node counts, and they were simply not available. It only took about 2 years from start to get this written and published, and the raw data was lost somewhere in 1995 (approximately). I simply computed the nodes based on the speedup and time and NPS CB produced. I _really_ don't care whether anyone trusts the node counts or not. They are _meaningless_ to anyone, except as a method used to explain why the speedup is not 16X on a 16 cpu machine. CB searched at roughly 16X the NPS when using 16 cpus, but it almost never ran 16x faster. The nodes would climb (this is called search overhead) to make the parallel search do more work than the serial search. To understand this, why don't you take Crafty, I will supply you a couple of logs for 1, 2 and 4 cpus. you take the raw search time to a specific depth for each move and record it. Compute the speedup. Then take the rough NPS number and see if you can compute the nodes searched by multiplying the nps by the time used. That's all I did in CB, and the numbers are _very_ accurate. The only thing that Vincent seems to have a problem with is that the nodes reported in the paper is +exactly+ proportional to the times reported, because they were derived from them. I'll leave it to you to do the computation and see whether or not you like the numbers. Here is just one sample. I can send you a log if you want. Note that this is run on my dual, so I would hope for speeds about 2x faster even though it reports 4 cpus (this is hyper-threading). log.001: time=27.48 cpu=99% mat=0 n=28197633 fh=92% nps=1.03M log.002: time=13.50 cpu=387% mat=0 n=28512307 fh=92% nps=2.11M Now, the above is for a fixed search depth. real data: 1cpu time=27.48 4cpu time=13.50 speedup=2.0, actual 4cpu nodes 28412307. computed 4cpu nodes 28350000. You be the judge of how "fake" that last number is. The only problem is that if you divide the first nodes by time, you get some number, while if you divide the computed nodes by time, you get _exactly_ the reported NPS. That is what Vincent went south about. My data was simply off if you wanted to compute more than one decimel place, because it was derived from two numbers, one of which was accurate to only _one_ decimel place. If you believe that is "faking" then more power to you. As for what Vincent believes, I may one day post an email or two here that _really_ explains his problem. And it will _really_ show his moral standards for reporting results. He wrote me once saying that he was trying to convince some agency to give him time, and he _knew_ he couldn't produce the kinds of speedups I did on the Cray. He said he saw two choices. (1) discredit my results; (2) explain why current programs can't produce decent speedups, and he wanted to blame this on null-move. I pointed out that null-move did _not_ make a significant difference, and I ran tests for him to show this, even though he was claiming it everywhere. I also pointed out that a NUMA machine would _never_ approach the performance of a pure SMP machine, but he simply could not grasp that idea and it went nowhere. So, since he couldn't figure out a way to justify his poor results, which were mainly a result of a poor architecture, he chose to try to discredit results that were better than his. And he talks about _me_ trying to commit academic fraud. He _really_ needs to look in the mirror. The emails I saved are much more revealing of his true character, as he was clearly intending to use that machine, period. His main goal was not to do good science, but to do whatever it took to impress his "sponsor". When you think about it, that is _not_ the way to do research. > >Hyatt gave an explanation but the problem is that the explanation was given too >late and not at the time of the publication. As I said, do the above computations, _then_ decide whether the node numbers are wrong enough to even consider. IE my speedup numbers are rounded to the nearest tenth. They _could_ have been published to 9 decimel places. Would that have enhanced anything? The node numbers _could_ have been rounded to the nearest one hundred thousand. Would _that_ have made them wrong? Would it have made _any_ difference to the paper, which didn't even discuss the node counts specifically? First do the math, _then_ decide what is significant and what is not. I've already done that. And if you do the computation above, you might have a different opinion. > >I usually believe that data is correct but if Bob Hyatt remembers to give more >information only after people find mistakes then we can wonder and suspect that >some more information is hidden and it is a reason to have doubts about the >article. Fine, then simply ignore it. If you believe _that_. Then you probably would think that the speedup numbers are wrong, simply because they were rounded and those 1/1000ths are important. I, however, _know_ that even the 1/10ths are meaningless in parallel speedups. > >Note that I do not claim that data that is calculated based on interpolation is >a mistake, but not mentioning it in time is a mistake. > >Uri Why don't you re-read the article. And notice that the node counts are not mentioned in the paper. Why? Because they were not _in_ the paper's original contents. They are _meaningless_ to the context of the paper, and the only thing they show is that a 2-cpu search usually searches a larger tree than a 1-cpu search. Without the node values, you could make either of the two assumptions and they _could be correct: (1) the speed-up was less than optimal because processors were all the time busy waiting on each other and not doing useful work; (2) the speed-up was less than optimal because processors were busy searching all the time, but in the parallel search they searched a larger tree than the serial search. _that_ was why the request for node counts was originally made. And the numbers given are _perfect_ with respect to showing that (2) was the case in Cray Blitz (and it is also the case for Crafty and any other parallel search program I have seen, until you get to NUMA where (2) is _still_ a major factor, but suddenly (1) becomes measurable also due to memory latency issues. So the node numbers were requested even though the paper _clearly_ states that in CB, _all_ cpus search _all_ the time. A CPU _never_ sits idle waiting on something to do for more than a few milliseconds out of 4-5 minutes total time. You could take the node numbers out of the paper, and it would still tell the _same_ story. _perfectly_. And in the review process, only _one_ reviewer even wanted me to go back and add the node numbers. The point being that the node counts were not needed in the context of the paper as written, they were simply requested for conformity with what had been published by _other_ authors on various parallel search numbers. Had Vincent bothered to look at my dissertation, which I pointed him to multiple times, he would have seen _real_ numbers from front to back, and in the case of those numbers, I still have the original printed log files in my office, so they were not lost. Of course, he wouldn't let a little data get in the way of his quest to get access to a big machine, so that really didn't matter much. But, form your _own_ opinion. If you think I fake data, then by all means ignore anything further I write/post here. If you think (as I do) that maybe the real fake is Vincent here (SOS kills Crafty, any debugged program kills crafty, his parallel speedup is always > 2 for 2 cpus, etc) since his results are _never_ reproduced by anyone. I produced a test run that had 4 processors running 3.1x faster. He ran a test where he got _no_ speedup using Crafty. I ran his test positions and got a speedup of 3.0x. He said "aha, your 3.1x number is phony, you only got 3.0x." Of course, he ignored that _he_ had reported that I got 1.01x or some such nonsense, something that _nobody_ has ever repeated. There is "faking" and there is "FAKING". I don't consider extrapolation to be "faking" at all. Of course it _should_ have been mentioned in the paper. But it was added after the fact, and during that part of the review process, we were actively _reducing_ the size of the paper, not thinking about _adding_ more text, because of requests of the editors to "keep it as short as possible." I'm not going to keep repeating this explanation. If we use the term "fraud" it ought to be associated with someone that does it _all_ the time (Vincent) rather than me. I don't consider that paper "fraud" at all. I considered the node counts so unimportant that I didn't even remember how we had done 'em until after the issue was raised. It was _that_ unimportant to me, since the numbers were _that_ unimportant to the paper as written.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.