Author: Gerd Isenberg
Date: 11:13:12 02/06/02
Go up one level in this thread
On February 05, 2002 at 08:32:35, Antonio Senatore wrote: >Hi friends: > >I have one array A[n] and another B[n] (both of the same dimension) and I want >to make A = B without using a loop like > >for (i=0; i < n; i++) A[i] = B[i]; > >My question is if is it possible to do that without using none kind of loops >(and as I am working in C, I can't work with vectors or to use the lib >"algorithm") > >Thanks in advance > >Antonio Nice technik to unroll small copy loops in (inline) assembler, is a sequence of movsX (X=b,w,d), with a "done" label behind, instead of rep movsX. Subtracting appropriate n from label becomes an indirect jump target, copying n items. movsX is only one byte opcode, so to have 32 movsd to copy max 128 dword-aligned bytes is quite cache friendly and may faster than rep movsd. from http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_739_2983,00.html AMD Athlon™ Processor x86 Code Optimization: Example 2: Optimized memcpy() for Any Data Size or Alignment ... $memcpy_do_align: mov ecx,8 ;a trick that ’s faster than rep movsb... sub ecx,edi ;align destination to qword and ecx,111b ;get the low bits sub ebx,ecx ;update copy count neg ecx ;set up to jump into the array add ecx,offset $memcpy_align_done jmp ecx ;jump to array of movsb ’s align 4 movsb movsb movsb movsb movsb movsb movsb movsb $memcpy_align_done: ;destination is dword aligned mov ecx,ebx ;number of bytes left to copy shr ecx,6 ;get 64-byte block count jz $memcpy_ic_2;finish the last few bytes cmp ecx,IN_CACHE_COPY/64;too big 4 cache?use uncached copy jae $memcpy_uc_test ... Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.