Author: Tom Likens
Date: 13:26:08 02/05/02
Go up one level in this thread
On February 05, 2002 at 15:31:25, Dann Corbit wrote: >On February 05, 2002 at 13:13:09, Miguel A. Ballicora wrote: >[snip] >>>> register int n=(count+7)/8; >>>> >>>> switch(count & 7){ >>>> case 0: do { *to++ = *from++; >>>> case 7: *to++ = *from++; >>>> case 6: *to++ = *from++; >>>> case 5: *to++ = *from++; >>>> case 4: *to++ = *from++; >>>> case 3: *to++ = *from++; >>>> case 2: *to++ = *from++; >>>> case 1: *to++ = *from++; >>>> } while(--n > 0); >>>> } >>>> >>> >>>Nice -:( >>> >>>The "do ... while" in the switch statement is really driving me crazy. >>> >>>Regards, Uli >> >>I will check K&RII later but IIRC case 0: can be understood as a label >>and I just figure that the same will be with "do" >>and "while ()" would be the same as "if () goto". >>You can jump right into the middle of this loop, crazy isn'it? > >It's found in the C FAQ: >20.35: What is "Duff's Device"? > >A: It's a devastatingly deviously unrolled byte-copying loop, > devised by Tom Duff while he was at Lucasfilm. In its "classic" > form, it looks like: > > register n = (count + 7) / 8; /* count > 0 assumed */ > switch (count % 8) > { > case 0: do { *to = *from++; > case 7: *to = *from++; > case 6: *to = *from++; > case 5: *to = *from++; > case 4: *to = *from++; > case 3: *to = *from++; > case 2: *to = *from++; > case 1: *to = *from++; > } while (--n > 0); > } > > where count bytes are to be copied from the array pointed to by > from to the memory location pointed to by to (which is a memory- > mapped device output register, which is why to isn't > incremented). It solves the problem of handling the leftover > bytes (when count isn't a multiple of 8) by interleaving a > switch statement with the loop which copies bytes 8 at a time. > (Believe it or not, it *is* legal to have case labels buried > within blocks nested in a switch statement like this. In his > announcement of the technique to C's developers and the world, > Duff noted that C's switch syntax, in particular its "fall > through" behavior, had long been controversial, and that "This > code forms some sort of argument in that debate, but I'm not > sure whether it's for or against.") > >From my benchmarking, memcpy() is faster. Duff's device can be worthwhile in a >few rare circumstances (e.g. compiler does not know how to unroll and the data >objects in the array are large). It can also cause large problems on vector >machines (e.g. an enormous slowdown rather than a speedup). > >For the most part, the day for Duff's device as a useful computing tool are >past. All the compilers I use know how to unroll anyway. It's a useful trick for Duff's original application, writing to a memory- mapped device (which is why he didn't use memcpy in the first place). For modern applications Dieter's probably right. Memcpy usually is inlined and coded in highly optimized assembly. It's also easier to understand. On the other hand, code like Duff's has it's own appeal. I'd be careful with Gnu's -funroll-loops (and especially -funroll-all-loops) since these switches can actually make your program run slower by causing a cache miss/penalty. regards, --tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.