Author: Vincent Diepeveen
Date: 07:55:17 12/24/02
Go up one level in this thread
On December 24, 2002 at 04:47:21, Frank Phillips wrote: >On December 23, 2002 at 12:12:29, Vincent Diepeveen wrote: > >>On December 23, 2002 at 12:01:10, Robert Hyatt wrote: >> >>You forget the crucial data point and that's that you have >>no AMD K7s out there. > >Intel is about 30% faster than gcc3.2 (and gcc2.95 and gcc 2.96) with profile >guided optimisation for me on my AMD Athlons (Palomino and Thoroughbred). Not >all of this can be due to incompetence, I suggest. > >Frank you must be using bitboards then. No other option possible. profile guided optimization speeds me up 20% at k7 with gcc 3.x > >> >>>On December 23, 2002 at 09:45:25, Vincent Diepeveen wrote: >>> >>>>On December 22, 2002 at 08:42:23, Joel wrote: >>>> >>>>>Hey all, >>>>> >>>>>Was reading some of the previous threads where the general consensus seemed to >>>>>be that the Intel C++ 7.0 compiler did a much better job at optimising than the >>>>>VC 6.0 Sp4 compiler did. >>>> >>>>at intel hardware i do not doubt it. >>>> >>>>But it is at your own risk of course whether it is producing correctly >>>>working code for all of your users who also have k7s and perhaps assume >>>>the program must not crash. >>>> >>>>At my K7 the intel compiler crashes time and time again. Also it's slower >>>>than the gcc compiler when using branch profile info (-fbranch-probabilities) >>>>after first generating the info. >>>> >>>>intel without that branch profile info is just like gcc without that info >>>>slower at the k7 than msvc 6 sp4 processor pack. >>>> >>>>the processor pack is crucial for sp4 because it adds a 2% in speed >>>>and the speed differences between default gcc compile and intel c++ compiles >>>>versus msvc sp4 with the procpack is measured at 1% and 0.5% >>>> >>>>but then that profile info increasing the speed for gcc (which is a >>>>time consuming thing, also for the intel compiler of course) is giving >>>>an additional 20% speedup blowing away the other compilers. >>>> >>>>Now let's touch correctness. For a long period of time GCC was a very bad >>>>compiler. Especially many 2.96 versions were very broken. And very buggy. >>>> >>>>Before the 2.95.x versions also there were numerous bugs in gcc with regards >>>>to parallel behaviour (i use 'volatile' variables a lot because diep is >>>>SMP). Also they were dead slow. the 2.95.x versions are dead slow for me >>>>when compared to a default msvc 6 compile. Like 12.5% difference is >>>>no coincidence at a k7. And 10% at a P3. >>>> >>>>But the 3.xx versions are great. If i understand well AMD contributed to >>>>some linux distributions money in order to improve the gcc compiler for >>>>their processors. Of course i have no exact info here i just read around >>>>at the internet for this. >>>> >>>>But the sad thing is that an old 586 compiler msvc6 with a processor pack >>>>that just speeds it up 2% is faster on AMD hardware than the most recent >>>>compilers without that reordering pass. >>>> >>>>Of course this is for DIEP. >>>> >>>>Crafty uses weird 64 bits structures called bitboards it is trivial that >>>>older compilers didn't know how to emulate that very well on 32 bits >>>>processors. It's here only where Bob can claim the intel compiler is >>>>fast for him. >>> >>> >>>I have absolutely no idea what you are talking about. Every gcc compiler >>>after 2.5 worked perfectly with long longs. As does every compiler I have >>>tried in the past 5 years porting crafty to every unix machine made. >>> >>>The intel compiler _is_ faster than gcc 3 for me. And for everyone here at >>>UAB that has tested the two. >>> >>>Too many data points from others, only one from you. I tend to believe the >>>majority. >>> >>> >>> >>>> >>>>for GCC i use next format to compile: >>>> >>>>CFLAGS = -pg -fprofile-arcs -O2 -march=athlon -mcpu=athlon -frename-registers >>>>-DUNIXPII -fno-gcse -foptimize-register-move >>>> >>>>then i run diep for half an hour. >>>> >>>>then i recompile it using: >>>> >>>>CFLAGS = -O2 -march=athlon -fbranch-probabilities -frename-registers >>>>-DUNIXPII -fomit-frame-pointer -fno-gcse -foptimize-register-move >>>> >>>>in case of boundschecking: >>>> >>>>#CFLAGS = -g -DUNIXPII -O2 -fbounds-checking -Wall >>>> >>>># intel c++ nu >>>>#CC = icc >>>>#CPP = icc >>>>#CFLAGS = -g -DUNIXPII >>>>#CFLAGS = -O3 -tpp6 -axi -xi -rcd -prof_genx -DUNIXPII >>>>#CFLAGS = -O3 -tpp6 -axi -xi -rcd -prof_use -DUNIXPII >>>> >>>>Best regards, >>>>Vincent >>>> >>>>>My compiler knowledge is very limited - I have written a C compiler before (uni >>>>>assignment), but optimisation wasn't an issue. I have no real idea how an >>>>>optimising compiler goes about it's work. >>>>> >>>>>For the record I have an Athlon XP 2100+, and my engine is bitboard based. >>>>> >>>>>Having said that, I installed the Intel compiler, and tried compiling my latest >>>>>version of Bodo, and then ran my dodgy little speed benchmark on it. It was >>>>>actually slower than the VC 6.0 compiler, though I have reason to suspect my >>>>>incompetence is the issue, largely due to statements like: >>>>> >>>>>"Did you use the intel C++ 7.0? Of course not. Did you do the profile-feedback >>>>>optimizations? Probably not." >>>> >>>>>What I am asking is how do I do this profile-feedback optimisations, and or any >>>>>other optimisations which you guys do? >>>> >>>>>What would be particularly helpful is other people could give me the compiler >>>>>command line parameters they use to generate fast code. >>>> >>>>>I really need to buy a book on optimising compilers so I understand what the >>>>>hell is happening here. :| >>>> >>>>>Any help greatly appreciated, >>>>>Joel Veness
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.