Author: Robert Hyatt
Date: 12:02:55 09/30/02
Go up one level in this thread
On September 30, 2002 at 04:36:10, Vincent Diepeveen wrote: >On September 29, 2002 at 23:38:53, Robert Hyatt wrote: > >As usual you lack the technical knowledge to know the >difference. If your thing is programmed better then >you get the raw cpu speed, where the AMD is way faster. > >Yet you won't listen to this at all, i'm sure of it. > >You don't ever even have compiled for an AMD. You don't >even own one. I own both intel AND amd's. So? I don't own a Cray either, but I know a _lot_ about how it works internally. Not owning one does not equal not understanding one... > >I am even sure you didn't even do a verification check >of every move made and every evaluation and what the >evaluation was of intel versus AMD optimized executables. I did a 300 position validation comparing gcc 2.95.2 to intel's compiler. It passed perfectly, node per node. I also tried to do the same test with 3.x compilers (gcc) but it was hopeless as the program generally crashes instantly, which is a clear hint that all is not well with 3.x gcc versions... > >For a 500k nodes run i generate up to 100MB+ logfiles. >A simple file comparision shows possible bugs in either >the program or the compiler. So? I do the same, although I generally avoid dumping the tree unless I _know_ the node counts are not matching... > >Intel c++ has a major problem here. It has bugs. It has none that I can find. It has none that several molecular modeling folks here can detect either. > >You are too plain lazy to compare the things. You are the one that is lazy. You _continuously_ make such ridiculous claims, and when I give proof each time that you are wrong, you just walk off, no comment, and come back to make another ridiculous claim another day. For the last time, how about posting some code that Intel's compiler fails on? The last time you did this, I tried it and it worked perfectly. Making wild statements and waving your arms madly won't convince me, nor make it true. I want something _factual_. So far, you've produced a great deal of hot air and turbulance, but nothing whatsoever of any substance. I _know_ how many different programs here are using Intel's compiler. I know how _carefully_ they tested. One person using the namd package on my cluster ran some tests that took 48+ hours for a run, and compared the results number by number to gcc. And they matched _perfectly_. > >All we see is that for computerchess even your own thing >kicks butt at AMD single cpu. What does that have to do with the intel compiler??? > >Every idiot will understand then that if you are a good programmer >that it gets nearly 2 times faster then when running SMP at it and >that it's way faster then. > >No hashtable lookup isn't your problem at all. Actually it is, from using the machine-specific-registers in the Intel hardware... > >>On September 29, 2002 at 11:36:51, Vincent Diepeveen wrote: >> >>>On September 28, 2002 at 12:39:05, Robert Hyatt wrote: >>> >>>>On September 28, 2002 at 04:28:02, Aaron Gordon wrote: >>>> >>>>>On September 27, 2002 at 23:42:03, Robert Hyatt wrote: >>>>> >>>>>>I didn't run the SMP tests for AMD, I don't have a one here and have no plans >>>>>>to get one. I posted a chart of data others provided. I don't even remember >>>>>>which position we used now. All that was significant was that all the speedup >>>>>>numbers (raw nps, not parallel search times) were in the 1.4-1.5 range with >>>>>>AMD, and 1.8 and above for the intel boxes... >>>>>> >>>>>>I personally believe it highlights a memory bottleneck... >>>>> >>>>>I don't think it's fair for you to find the slowest possible binary for the AMD >>>>>and some IntelC5 binary and then claim that the speedup is slow. I don't think >>>>>it's fair either if someone takes a slow binary for a P4 and compares it to a >>>>>fast binary for an AMD cpu. >>>> >>>>I'm not doing that. But you are missing the point as we are not comparing >>>>speeds between AMD and Intel. I run _any_ executable on AMD using one >>>>cpu, then the same using two, and compute the NPS speedup. I do the same for >>>>Intel. It won't matter whether the executable is fast or slow, as I am not >>>>comparing nps between intel and AMD. I am comparing the NPS speedup from 1-2 >>>>cpus on AMD against the NPS speedup from 1-2 cpus in Intel... >>> >>>DIEP's speeds at dual AMD is way faster than any dual intel version of it. >>> >>>Obviously DIEP is not so dependant like crafty on continuesly poking >>>through the chipset. >>> >> >> >> >> >>Any chance you might actually _read_ a post before you respond to it. This >>is not about comparing raw speed on intel vs raw speed on AMD. It is _only_ >>discussing the dual-cpu problem everyone has seen that posted numbers the >>last time you brought this up. A dual AMD gets a smaller boost over a single >>AMD, than does a dual Intel compared to a single intel. >> >>It is about _nothing_ else. >> >>Also I have no idea whatsoever about what you might mean about >>"continuously poking through the chipset" as I don't do that anywhere... >> >> >> >> >> >>>In general it's way harder to tune for AMD than it is for intel, no question >>>about it, with regard to producing SMP versions of a product. >>> >>>However let's take SOS for example and compare speeds dual at AMD versus intel. >>> >>>SOS as we both know is not poking at all between the processes. Yet it >>>gets roughly 1.8 speedup. >> >>what is "poiking between the processes"? Storing stuff in shared memory? >> >>It is _definitely_ doing that... >> >> >>> >>>Compare shredder, also a good speedup and AMD doing great for it. >>> >>>All the compares here are not so fair at all. We can benchmark crafty >>>single cpu, but benchmarkign it > 1 cpu is simply pathetic as the >>>parallellism is 20 years old. >> >>I love those ridiculous statements... >> >>Everything but your program is old. Stupid. etc. >> >>All that is old, stupid, etc is such remarks, made repeatedly, as if making >>them so often will make them true. hint: it won't.. >> >> >> >> >> >>> >>>>Fast or low executables won't make any difference in the _ratio_ I was looking >>>>at. Slow executable on AMD will still see a proportional speedup. Because the >>>>raw NPS is not important, the ration of 1 cpu to 2 cpus is all that counts >>>>here.. And AMD has problems... Not major problems, but problems nontheless... >>>> >>>> >>>> >>>> >>>>> >>>>>You seem to conveniently forget the benchmarks I've done and other people here >>>>>have done. Take a look at my latest graph of crafty results: >>>>>http://speedycpu.dyndns.org/crafty/craftybench4.jpg >>>>>Note: the P4 2.76GHz is an overclocked 1.8A northwood at 153.5fsb(614MHz RDRAM). >>>> >>>>I'm not forgetting _anything_. Benchmark nps does not matter whatsoever to >>>>_this_ discussion. It is _only_ the ratio of 2 cpu time to 1 cpu time for >>>>each specific processor. It shows that it is harder to run two cpus wide open >>>>on AMD than it is on Intel. >>>> >>>> >>>>> >>>>>Now, the SMP binaries I have are able to produce a 1.7x speedup in the >>>>>benchmark. You claim the P4's get 1.8x, thats fine. Take the P4-2.76's result >>>>>(1,120,011 nps) and multiply it by 1.8. You get 2,016,019.8 nps. Not too shabby, >>>>>right? Well.. take the 1.86Ghz XP and multiply it's nps by 1.7 and you get >>>>>2,035,330.1. Still faster. Now, if you're saying, "Well yadda yadda is >>>>>overclocked and etc etc". Yeah, and even faster things will be released here >>>>>shortly. I can guarantee the P4-2.76 w/ 614MHz RDRAM would be as fast or a hair >>>>>faster than a standard P4-2.8. The AthlonXP at 1.86 would be more around a 2300+ >>>>>if such a thing existed. >>>> >>>> >>>>Again, you are missing the point. I didn't say AMD was _slower_ than Intel >>>>anywhere. I simply said their two cpu machine does _not_ scale as well as >>>>the Intel duals. Nothing more, nothing less. That remains an easy to prove >>>>fact... >>>> >>>> >>>> >>>>> >>>>>Moving on to the future.. P4-3GHz will soon be released as well as the 2800+ >>>>>(being announced on October 1st). Lets do some rough guessing. If a P4 gets >>>>>1,120,011 nps @ 2.76 it should get about 1,217,403 nps at 3GHz and thats >>>>>probably still having the RDRAM clocked to insanity. Take the 2.52GHz AthlonXP @ >>>>>1,578,197. At 2133MHz (AthlonXP 2600+) it should do about 1,335,831 nps. Again >>>>>do 1,335,831 * 1.7 and 1,217,403 * 1.8 and you get: >>>>>2,270,912.7 nps for the dual XP 2600+ (2.13ghz) >>>>>2,191,325.4 nps for the dual P4-3GHz. >>>> >>>>Maybe or maybe not. But it _still_ doesn't change the fact that the dual AMD >>>>is less efficient (should optimally be 2x faster than a single) than a dual >>>>]intel... >>>> >>>> >>>> >>>>> >>>>>Since Crafty is pretty linear you know these numbers are very close to the >>>>>actual results. So far from what I've seen Pentium4's need an entire GHz more >>>>>and twice the L2 cache just to come close. This is what I call a $500 keychain.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.