Author: Matt Taylor
Date: 11:30:15 01/18/03
Go up one level in this thread
On January 18, 2003 at 05:23:54, David Rasmussen wrote: >On January 18, 2003 at 05:19:14, Matt Taylor wrote: > >>On January 18, 2003 at 05:02:44, David Rasmussen wrote: >> >>>On January 18, 2003 at 05:00:16, Matt Taylor wrote: >>> >>>> >>>>64-bit shift in something like 3 cycles when count is < 32. Pentium 4 L1 cache >>>>latency -- 2 clocks. Athlon L1 cache latency -- 3 clocks. >>> >>>I don't understand this >> >>Yeah, broken English is a bad thing. Here's how it breaks down: >> >>Shift form -- 3 cycles >> >>Table (Athlon) -- 3 cycles >>Table (Pentium 4) -- 2 cycles >> >>>>Bad performance of former indicitive of poor optimization. >>> >>>I don't understand this >> >>It means that the compiler is not doing its job. Perhaps there is an >>optimization switch you have not enabled. Perhaps your inline function is not in >>a header file where the C compiler can get at it. Perhaps the compiler just >>flops on 64-bit code. >> >>It means basically that something is wrong. There should be almost no difference >>in speed. >> >>-Matt > >I see. It does about the same with both Intel C++ and MSVC 7 Perahps they are generating inefficient code. I know MSVC 7 chokes big on 64-bit code. Multiply and divide go to library calls, and I think 64-bit shift does too. In that case, try the assembly I posted (noting the redundancy that Dieter pointed out in the gcc version). I truly find this sad if the compiler cannot make enough assumptions to generate decent code for 1ull << count. (Though I can sympathize with the compiler-writers a little bit.) -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.