Author: Gerd Isenberg
Date: 03:30:50 12/17/02
Go up one level in this thread
On December 17, 2002 at 05:08:25, Matt Taylor wrote: >On December 17, 2002 at 03:58:32, Gerd Isenberg wrote: > >>On December 17, 2002 at 01:47:27, Walter Faxon wrote: >> >>>On December 16, 2002 at 15:05:56, Gerd Isenberg wrote: >>> >>>>Hi all, >>>> >>>>My conclusion with Athlons bit scan (and reset) routines so far: >>>> >>>>Inlined bsf is the fastest for IsiChess, because of shortest code! >>>> >>>>Walter Faxon's magic LSB_64 and bitSearchAndReset_MMX (both faster not inlined) >>>>both produce a about 3-5% slower IsiChess. LSB_64 and bitSearchAndReset_MMX are >>>>equal in IsiChess. In a dumb-loop test bitSearchAndReset_MMX is clearly the >>>>fastest (14.9 seconds) for 10^8 * 10 bits resets. >>>> >>><snip> >>> >>> >>>It's understandable that even if bsf itself is relatively slow, its not needing >>>so many registers to do its work may more than make up the difference. >>> >>>Still, I'm confused. If the other two routines are not inlined, there must be >>>at least a little procedure call overhead, right? The registers they need will >>>have to be saved/restored either way. How can _not_ inlining produce faster >>>code? Matt, help! >>> >>>-- Walter >> >>Hi Walter, >> >>it depends on the program of course. In IsiChess inlined LSB_64 is definitely >>slower than _not_ inlined. Even code must be read into L1-cache, and more code >>implies more memory reads - there are so many "chaoic" interferences, at least >>in IsiChess, but i guess also in other programs. >> >>Cheers, >>Gerd > >Shouldn't be a big issue, though. They aggressively speculatively fetch code >into the L1 cache, and the cache is huge. AMD advises in all the optimization >guides to "aggressively unroll loops [because of large cache size]." > >The register contention seems more likely IMO... > >-Matt Hi Matt, Yes, even amd's optimization guide says: "Always Inline Functions with Fewer than 25 Machine Instructions" which is most often true, but it seems not always, at leat in my program ;-) The bitSearchAndReset_MMX (MM0) uses only eax as general purpose register, and has only 23 (20) mmx-instructions, but IsiChess is a few percent slower with inlined rather than not inlined versions of this routines. The opposite is true with the bsf routine, here the inlined version is clearly favourably. As i said before, there are so many "chaoic" interferences... In general i get better results with space rather than speed optimization. Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.