Author: Bruce Moreland
Date: 18:45:27 04/20/00
Go up one level in this thread
On April 19, 2000 at 15:33:14, Dann Corbit wrote:
>On April 19, 2000 at 15:23:54, Michel Langeveld wrote:
>
>>On April 19, 2000 at 14:54:34, Dann Corbit wrote:
>>
>>>On April 19, 2000 at 14:33:38, Andrew Dados wrote:
>>>
>>>>On April 19, 2000 at 14:28:33, Andrew Dados wrote:
>>>>
>>>>>On April 19, 2000 at 13:49:25, Dann Corbit wrote:
>>>>>
>>>>>>What is the fastest way to fill a linear array of bytes with zero, given the
>>>>>>following conditions:
>>>>>>1. Intel PII or higher CPU
>>>>>>2. Guarantee that the number of bytes is an even number?
>>>>>>
>>>>>>I am porting a chess program, and memset() is the bottleneck. I don't need to
>>>>>>memset an arbitrary character. It's always zero.
>>>>>
>>>>>for pure 32bit windoze (assuming es==ds):
>>>>>
>>>>>asm
>>>>> mov edi, begin_address
>>>>> mov ecx, count ; number of words to fill
>>>>> xor eax,eax ; filling with 0x0000
>>>>> shr ecx,2 ; here carry gets set if count is not divisible by 4
>>>>> rep stosd
>>>>> jnc @finito
>>>>> stosw ; fill extra word if count mod 4 !=0
>>>>>@finito:
>>>>>end;
>>>>
>>>>oeps.. count above is of course number of *bytes* to fill.
>>>> if count was number of words, then shift ecx by 1 only....
>>>
>>>Thanks.
>>>
>>>Turns out, I have a guarantee that the objects will always be 4 byte integers,
>>>so here is what I have so far:
>>>
>>>/*
>>>On April 19, 2000 at 14:28:33, Andrew Dados wrote:
>>>
>>>On April 19, 2000 at 13:49:25, Dann Corbit wrote:
>>>
>>>What is the fastest way to fill a linear array of bytes with zero, given the
>>>following conditions:
>>>1. Intel PII or higher CPU
>>>2. Guarantee that the number of bytes is an even number?
>>>
>>>I am porting a chess program, and memset() is the bottleneck. I don't need to
>>>memset an arbitrary character. It's always zero.
>>>
>>>for pure 32bit windoze (assuming es==ds):
>>>*/
>>>void __cdecl fillit(unsigned long *begin_address, unsigned long count_of_longs)
>>>{
>>> _asm {
>>> mov edi, begin_address
>>> mov ecx, count_of_longs ; number of longs to fill
>>> mov eax, 0 ; filling with 0x0000
>>> rep stosd
>>> }
>>>}
>>
>>Keep in mind tnat this is ofcourse not the fastest with small sizes (probably
>><= 4 bytes).
>>
>>Make also sure that your void will be inlined, I think it's better to use
>>__fastcall or even more make a define of it.
>
>It was for clearing (rather large) hash tables. Unfortunately, the assembly
>version was indistinguishable from the library function in terms of speed, so I
>just went back to the fully portable memset().
Good plan. You guys were trying to write "memset" anyway, which is pointless
since from the compiler's point of view it's not just an inline function, it is
a compiler primitive. You might as well write an "add" function, you should
only have to do this if there is some odd case that the compiler handles poorly.
bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.