Author: Vincent Diepeveen
Date: 13:07:35 05/10/02
Go up one level in this thread
On May 09, 2002 at 23:53:05, Eugene Nalimov wrote: >Ok, now, when children are in bed, I can write longer reply. Hopefully you'll >read that and stop writing posts full of actual errors (I am not arguing with >you about chess, but our roles reverse when we start talking about compilers, >Spec2k, or high-performance processors for high-end tasks :-)). > >1) There is no single Fortran program in the SpecInt2k. Eleven programs are >written in C, one in C++. Sorry you right i might have been confused with specfp or something which i used to write that much hated article (only by hardware company dudes) about business performance being different than CPU potential. >2) It's much harder to optimize unpredictable branch-heavy integer programs than >the FP ones. To achieve good performance on FP code you need standard set of >optimizations -- loop fusion, loop interchange, etc., and those optimizations >give you nothing on integer code; it's well known what optimizations Sun used to >tremendously speedup 'art' benchmark, and there is nothing new in those >optimizations. Additionaly, on x86 big win is use of SSE/SSE2 registers -- not >because of the vectorization, but simply because you have "normal" FP registers, >not stacked ones. You're the expert on compilers here i couldn't say this better. >3) Spec rules do not allow to modify sources. I.e. it's not possible to rewrite >some functions into assembly, or manually inline some functions, or replace >algorithm that is very slow on the particular CPU by better suited one, etc. What a compiler is doing is putting a program into an executable, that's not exactly C code. so in short if you know a trick to rewrite a certain function where a load of instructions do something and you can replace that by for example 2 instructions, then obviously that means your boss is going to reward you! Without indepth study of what the function is doing, such an optimization is obviously not possible. Just by 'general' studies this won't happen. Of course all compiler teams have a good look at what the programs do, and try to improve their compiler on that specific program. They would be insane if they didn't. They can possibly sell more CPUs if the compiler optimizes that spectint2000 testset better. So there is a few billion dollar reason to optimize program specific. >4) SMT/CMP is very helpful for some programs and absolutely usuless for others. >For example, speeding up large database server by 20% by using SMT makes a lot >of sense. I regularly run several CPU-intensive tasks simultaneously, and here >that speedup will be noticeable, too. I believe that even your chess program can >benefit from SMT. In the long run we do not disagree here. I doubt the current generation of cpu's that are on the market can profit from this. Right now the P4 gets marketed as a good SMT/CMP (whatever the shorts for it) processor. AMD also has picked up this marketing hype and seems to be using it. Just like my 1.2 litres Opel Corsa has an 'i' behind it. It is 1.2i motor. 'i' from injection. hehehehehehehehe Injection gets used probably a lot in auto industry, but not in such a small engine :) For the same reason the current small pc processors can not use it. Very important with hypes is that it is very hard to proof the profit from it. If intel/AMD releases a new cpu which is 2x faster exactly than their current cpu, is it then faster because of SMT or is it faster because of it being a new CPU? In short, if i would believe the hype for current cpu's, then if i run diep at 2 search processes at a single cpu, then i should get 20% more nps than searching with 1. Well it doesn't. It doesn't even get 1% faster in fact. >5) Once again: during development of P4 Intel made some decisions that are >"future-oriented", i.e. allows Intel to use better process, or higher clock >frequence, or both. Once again: I predict that in the near future Crafty will be >faster on the fastest shipping P4 than on the fattest shipping AMD processor. >Let's wait and see. I am not saying the P4 is a bad cpu for the future. Some ideas out of it no doubt will be able to get used very well for a new and better processor, like the trace cache idea. I am not someone who can look into a glass globe and predict the future. Too bad the own specifications of intel are not met at the P4. On paper it can do 4 instructions a clock. So when it was released i was very happy. Then some smart hardware guys proved it could never do 4 instructions a clock because of some things in the processor that prevented it from ever doing that. That was a BIG disappointment for me. A lot of money invested for nothing! Then i understood that bigger bandwidth and a high clockrate had big marketing value with respect to quake3. And they succeeded in this. However for chess in the near future it is very clear whether the P4 or the K7 is better. Even a superb compiler team from intel will not be able to fix that. Nearly all postings i did here i always very clearly told it was DIEP which i used to benchmark. My encryption software you will not find any statements about, because it is designed to run on fast big machines, not on pc's; though i can do some really hard statements about which processor i like and which one i dislike here. I will not do it. I will not even give a kick. In this group we do not discuss whether Oracle gets 1% faster on a new P4 with a new compiler. We discuss computerchess. Let's keep the discussion focussed to that. >6) I am delibirately not discussing quality of Intel (or MS) compiler, not >comparing them, not writing my own opinion on them. >Regards, >Eugene Thanks, Vincent >On May 09, 2002 at 22:45:33, Vincent Diepeveen wrote: > >>On May 09, 2002 at 12:51:22, Eugene Nalimov wrote: >> >>Eugene, >> >>being in the compiler team you know the BS you write below. >>specint is not only crafty. it is also outdated Fortran programs >>where a bit of smart optimizing compilers have the edge. >> >>In fact some engineers recently managed a SEVEN times speedup >>of a particular program. >> >>In short the P4 is a complete joke from computerchess viewpoint >>and business applications, with exceptions of the source code >>you can get in your hands. >> >>I need to note that the beloved intel c++ compiler is creating >>completely illegal code with certain optimizations which i see >>getting used at the testsets for your beloved P4. >> >>Let's focus upon computerchess. There is a major joke the P4, >>even if it gets clocked to 10 Ghz. >> >>Let's not discuss even getting more than 1 thread running on >>a single P4 processor. Another insider joke, which amazingly >>is getting sold as getting the processor 20% faster, though >>AMD is also busy with that for now nonsense marketing hype ;) >> >>>Sorry Vincent, you are as always only partially right :-) >>> >>>Let's look at the SpecInt2k number. It's geometric mean of the 12 *real-world* >>>integer programs, one of them is old Crafty version. >>> >>>Best SpecInt2k for AMD I was able to found on www.spec.org is 720 base / 749 >>>peak for Athlon XP 2100+. Best official Pentium 4 result is 819 base / 833 peak >>>for 2.4GHz processor. Unofficial (not yet submitted to SPEC) result for Pentium >>>4 2.53GHz is 882 base. >>> >>>So, for mix of real-world programs, Pentium 4 is definitely better. You can >>>compare results yourself: >>> >>>http://www.spec.org/osg/cpu2000/results/res2002q2/cpu2000-20020422-01326.asc >>>http://www.spec.org/osg/cpu2000/results/res2002q2/cpu2000-20020401-01279.asc >> >>>Of course YMMV. You can be unhappy person who need to run the application that >>>is slow on Pentium 4. So let's look at the individual result: Crafty on P4/2.4 >>>runs 123 seconds. Crafty on AMD/2100+ runs 98 seconds. I.e. ~25% slower. >> >>You're doing statements math wrong. >> >>1.25 + 2.4Ghz/1.73Ghz = 1.734 ==> 73.4% faster is the AMD a Mhz. >> >>Even more than 70%! >> >>>Definitely less than 70% you are writing everywhere. >>>My prediction is that with the widening clock speed difference (caused by design >> >>With 73.4% difference at 'widening clock speeds' and knowing 0.13 micron >>K7 is nearly in the shops, let's assume end of this year they reach >>2.53Ghz too with the K7 0.13, just like the P4 0.13 is hitting 2.53Ghz now >>too. >> >>Assuming lineair performance (which isn't true, not for K7 and not for >>P4 either, so in fact it must be even a faster cpu, we just calculate >>a bound here): >> 2.53 Ghz 0.13 K7 x 1.734 = 4.4Ghz >> >>So the P4 needs to get released over 4.4Ghz to beat a K7 at 2.53Ghz >>assuming lineair extrapolation. Reality is of course that it's more like >>6Ghz than it is 4.4Ghz. >> >>See the problem for the P4 in the future? >> >>The 3.5Ghz is announced for start of 2003 to get on the market. >>Realistically before the end of the year we'll have a 2.2Ghz K7 >>though at the market. >> >>3.5Ghz P4 / 1.734 ==> 2Ghz >> >>So if at the time the 3.5Ghz P4 is released, the AMD factories >>released a 2Ghz K7, then you again have a problem. >> >> >>>decisions Intel made during P4 development) we'll soon see P4 that runs Crafty >>>faster than any shipping AMD processor. >> >>I don't doubt you find another few sneaky optimizations that speedup >>crafty. In fact from my head i already know some routines which if >>ported to assembly will give crafty 10% speed boost. >> >>Starting up their own compiler team was of course a very smart decision >>from intel. It's giving the intel processors a boost in the same way >>the 'supercomputer processors' in the past looked better than they >>were. >> >>One thing even the best compiler team can't take away is a bottleneck >>like a 1024 word L1 datacache of the P4. >> >>That's a row of 32 x 32 words or so. Real little in nowadays computing! >> >>Anyway, glad i'm not in your situation. Must be impossible to build an >>even better compiler version for the intel hardware than it is doing >>now, only program specific optimizations are possible now :) >> >>>Eugene >>> >>>On May 09, 2002 at 00:35:10, Vincent Diepeveen wrote: >>> >>>>On May 08, 2002 at 03:19:50, Slater Wold wrote: >>>> >>>>any big company in USA has a dude lurking around here. >>>>computerchess is in specint2000 remember? >>>> >>>>apart from that many people are interested in computerchess. >>>> >>>>big chance about 10 people check regurarly here who work for m$, >>>>about 3 i could mention from head who might work for intel. And >>>>another one if i remember well AMD and the list goes on. >>>> >>>>Whatever happens, support from intel is great compared to >>>>AMD for example. Yes AMD is the superb processor, no doubt. >>>>Even a good 'cheating' compiler (cheating in the sense that >>>>it isn't trying any trivial thing to get fast on the AMD K7 >>>>processor) which cheats by about 10%. >>>> >>>>Despite that, they still get kicked butt by AMD processors. >>>> >>>>If you however consider the good support from their helpdesk, >>>>the fact that they can press a 2.53Ghz sticker onto the new >>>>northwood whereas AMD only can stick a 1.73Ghz sticker on the >>>>2100MP (which somehow nowhere can be bought yet in europe like >>>>the 2.53 northwood can't get bought), the fact they have >>>>their own compiler, then you know they last forever. >>>> >>>>AMD still has to develop their own compiler, or they will >>>>go run behind soon. >>>> >>>>any P4 news doesn't interest me much till they fixed the 8KB L1 >>>>cache (the reason why the processor sucks is also the reason >>>>it can get clocked so high i guess, well that's the opinion >>>>of a layman). >>>> >>>>real interesting though is the mckinley. Many people at intel and other >>>>big companies speak about it. So far not a single testresult reached >>>>me from it. DIEP in this respect isn't even most important. I already >>>>know pretty well how diep is going to do on it when i know the speed >>>>of crafty on it at a specint test. >>>> >>>>AMD will get nowhere at 64 bits world till they have their own compiler. >>>> >>>>>I got an e-mail tonite. From Intel. That's a first........ >>>>> >>>>>Perhaps there are eyes on us everywhere! >>>>> >>>>>It was from a "Systems Engineer", telling me how to setup a 2.53Ghz machine >>>>>*correctly* and that I should have no problems "..beating any AMD CPU on the >>>>>market, overclocked or not, running any "optimized" program.." >>>>> >>>>>I checked the e-mail, it's valid. He stated that he would "..appreciate my >>>>>cooperation in keeping his name, and this e-mail, as quiet as possbible.." >>>>>Well, I guess this is as quiet as I can keep it. ;) >>>>> >>>>>Aaron, he told me they were hitting 300+ fps with the setup he described to me >>>>>in the e-mail, using a GeForce3 Ti500. He told me using a GeForce4 Ti4600 or >>>>>the Quatro4 would probably net gains near 30+ fps. You getting anywhere near >>>>>this number? >>>>> >>>>>He also told me that P4's have always been geared towards multimedia. And that >>>>>it didn't hurt his feelings that AMDs were beating the P4s in "chess >>>>>applications". He stated, and I quote, "If AMDs audience is those who require >>>>>good numbers running their chess applications, well, that's good news for >>>>>Intel." >>>>> >>>>>I have no doubts this guy is for real. And I will setup this system as he has >>>>>"instructed". Whoever turned me in, thanks! ;) >>>>> >>>>>To quote a movie, "We now have corporate sponsorship." :D
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.