Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: more resullts

Author: Gerd Isenberg

Date: 07:05:22 07/07/03

Go up one level in this thread


On July 07, 2003 at 09:34:58, Bo Persson wrote:

>On July 07, 2003 at 08:48:39, Gerd Isenberg wrote:
>
>>at least one colleague has the same strange effect than Dieter:
>>
>>Gerd
>>
>>
>>Gerd P4 2.4GHz:
>>       nothing 3951541892 13.390
>>         abs() 1713113360 13.141
>>  simple_abs() 1713113360 19.562
>>    omid_abs() 1713113360 13.672
>>     sbb_abs() 1713113360 17.969
>>     cdq_abs() 1713113360 17.625
>>    fish_abs() 1713113360 21.750
>>     sar_abs() 1713113360 16.984
>>   cmovl_abs() 1713113360 16.782
>>   cmovs_abs() 1713113360 16.781
>>
>
>This isn't as strange as it might seem. We are trying to time an *extremely*
>small piece of code. The instructions selected by the compiler actually executes
>att a different speed on different processors.
>
>I have MSVC 7.1 where do_nothing results in:
>
>; 304  :     for (i = 0; i < MAX_ITERATIONS; ++i) {
>; 305  :
>; 306  :         // subtract so we get both positive and negative numbers
>; 307  :         int a = rand() - 16384;
>
>  00020	e8 00 00 00 00	 call	 _rand
>  00025	4f		 dec	 edi
>
>; 308  :
>; 309  :         sum += a;
>
>  00026	8d b4 06 00 c0
>	ff ff		 lea	 esi, DWORD PTR [esi+eax-16384]
>  0002d	75 f1		 jne	 SHORT $L10491
>
>; 310  :     }
>
>Here an LEA is used to compute sum + a - 16384 in a single instruction!
>
>while test_abs is just slightly different:
>
>; 25   :     for (i = 0; i < MAX_ITERATIONS; ++i) {
>; 26   :
>; 27   :         // subtract so we get both positive and negative numbers
>; 28   :         int a = rand() - 16384;
>
>  00020	e8 00 00 00 00	 call	 _rand
>  00025	2d 00 40 00 00	 sub	 eax, 16384		; 00004000H
>
>; 29   :
>; 30   :         sum += abs(a);
>
>  0002a	99		 cdq
>  0002b	33 c2		 xor	 eax, edx
>  0002d	2b c2		 sub	 eax, edx
>  0002f	03 f0		 add	 esi, eax
>  00031	4f		 dec	 edi
>  00032	75 ec		 jne	 SHORT $L10356
>
>
>On a P4 the LEA instruction is broken up into several (but unspecified)
>micro-ops. It is not fast - in fact Intel says that it is no longer an
>optimization to use it! On the PIII, of course, it has dedicated hardware...
>

aha, yes but that's not the point, see below.


>Except for the CDQ, all the other instructions are in the core RISC set, that
>executes at up to 3 instructions per clock on a P4.
>
>
>So doing something fast *can* be quicker than doing nothing slowly. :-)

at least most often ;-)

>
>
>Bo Persson
>bop2@telia.com

Hi Bo,

this omid_abs was strange:

Sebastians "High Media" P4 2GHz
       nothing 3951541892 18.666
         abs() 1713113360 19.959
  simple_abs() 1713113360 25.487
    omid_abs() 1713113360 11.116 !!!!!!!!
     sbb_abs() 1713113360 24.365
     cdq_abs() 1713113360 24.235
    fish_abs() 1713113360 29.522
     sar_abs() 1713113360 23.083
   cmovl_abs() 1713113360 24.325
   cmovs_abs() 1713113360 24.325

and this from Dieter's P4:

MSVC, Russel's code, -Ox2 -Ob2 -G6 -Gr -GF

       nothing 3951541892 13.309
         abs() 1713113360 14.400
  simple_abs() 1713113360 17.936
    omid_abs() 1713113360 7.932   !!! Yes, reprocucable
     sbb_abs() 1713113360 17.144
     cdq_abs() 1713113360 17.555
    fish_abs() 1713113360 20.900
     sar_abs() 1713113360 16.464
   cmovl_abs() 1713113360 17.365
   cmovs_abs() 1713113360 17.345

http://www.talkchess.com/forums/1/message.html?304949
and following. Do you have an explanation for adding code and doubling the
speed?

Regards,
Gerd




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.