Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Speaking of fast ways to do things...

Author: Dann Corbit

Date: 12:33:14 04/19/00

Go up one level in this thread


On April 19, 2000 at 15:23:54, Michel Langeveld wrote:

>On April 19, 2000 at 14:54:34, Dann Corbit wrote:
>
>>On April 19, 2000 at 14:33:38, Andrew Dados wrote:
>>
>>>On April 19, 2000 at 14:28:33, Andrew Dados wrote:
>>>
>>>>On April 19, 2000 at 13:49:25, Dann Corbit wrote:
>>>>
>>>>>What is the fastest way to fill a linear array of bytes with zero, given the
>>>>>following conditions:
>>>>>1.  Intel PII or higher CPU
>>>>>2.  Guarantee that the number of bytes is an even number?
>>>>>
>>>>>I am porting a chess program, and memset() is the bottleneck.  I don't need to
>>>>>memset an arbitrary character.  It's always zero.
>>>>
>>>>for pure 32bit windoze (assuming es==ds):
>>>>
>>>>asm
>>>> mov edi, begin_address
>>>> mov ecx, count ; number of words to fill
>>>> xor eax,eax    ; filling with 0x0000
>>>> shr ecx,2      ; here carry gets set if count is not divisible by 4
>>>> rep stosd
>>>> jnc @finito
>>>> stosw          ; fill extra word if count mod 4 !=0
>>>>@finito:
>>>>end;
>>>
>>>oeps.. count above is of course number of *bytes* to fill.
>>> if count was number of words, then shift ecx by 1 only....
>>
>>Thanks.
>>
>>Turns out, I have a guarantee that the objects will always be 4 byte integers,
>>so here is what I have so far:
>>
>>/*
>>On April 19, 2000 at 14:28:33, Andrew Dados wrote:
>>
>>On April 19, 2000 at 13:49:25, Dann Corbit wrote:
>>
>>What is the fastest way to fill a linear array of bytes with zero, given the
>>following conditions:
>>1.  Intel PII or higher CPU
>>2.  Guarantee that the number of bytes is an even number?
>>
>>I am porting a chess program, and memset() is the bottleneck.  I don't need to
>>memset an arbitrary character.  It's always zero.
>>
>>for pure 32bit windoze (assuming es==ds):
>>*/
>>void __cdecl fillit(unsigned long *begin_address, unsigned long count_of_longs)
>>{
>>  _asm {
>>    mov edi, begin_address
>>    mov ecx, count_of_longs ; number of longs to fill
>>    mov eax, 0              ; filling with 0x0000
>>    rep stosd
>>  }
>>}
>
>Keep in mind tnat this is ofcourse not the fastest with small sizes (probably
><= 4 bytes).
>
>Make also sure that your void will be inlined, I think it's better to use
>__fastcall or even more make a define of it.

It was for clearing (rather large) hash tables.  Unfortunately, the assembly
version was indistinguishable from the library function in terms of speed, so I
just went back to the fully portable memset().




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.