Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: One (silly) question about "C"

Author: Gerd Isenberg

Date: 11:13:12 02/06/02

Go up one level in this thread


On February 05, 2002 at 08:32:35, Antonio Senatore wrote:

>Hi friends:
>
>I have one array A[n] and another B[n] (both of the same dimension) and I want
>to make A = B without using a loop like
>
>for (i=0; i < n; i++) A[i] = B[i];
>
>My question is if is it possible to do that without using none kind of loops
>(and as I am working in C, I can't work with vectors or to use the lib
>"algorithm")
>
>Thanks in advance
>
>Antonio

Nice technik to unroll small copy loops in (inline) assembler, is a sequence of
movsX (X=b,w,d), with a "done" label behind, instead of rep movsX. Subtracting
appropriate n from label becomes an indirect jump target, copying n items.
movsX is only one byte opcode, so to have 32 movsd to copy max 128 dword-aligned
bytes is quite cache friendly and may faster than rep movsd.

from

http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_739_2983,00.html

AMD Athlon™ Processor x86 Code Optimization:
Example 2: Optimized memcpy() for Any Data Size or Alignment

...
$memcpy_do_align:
 mov ecx,8    ;a trick that ’s faster than rep movsb...
 sub ecx,edi  ;align destination to qword
 and ecx,111b ;get the low bits
 sub ebx,ecx  ;update copy count
 neg ecx      ;set up to jump into the array
 add ecx,offset $memcpy_align_done
 jmp ecx      ;jump to array of movsb ’s
 align 4
 movsb
 movsb
 movsb
 movsb
 movsb
 movsb
 movsb
 movsb
$memcpy_align_done: ;destination is dword aligned
 mov ecx,ebx ;number of bytes left to copy
 shr ecx,6 ;get 64-byte block count
 jz $memcpy_ic_2;finish the last few bytes
 cmp ecx,IN_CACHE_COPY/64;too big 4 cache?use uncached copy
 jae $memcpy_uc_test
...

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.