Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Speed factors with 32 bit to 64 migration

Author: Dieter Buerssner

Date: 10:43:45 05/01/05

Go up one level in this thread


On May 01, 2005 at 09:09:40, Gerd Isenberg wrote:

>On May 01, 2005 at 08:14:16, Dieter Buerssner wrote:
>
>>On May 01, 2005 at 07:43:35, Gerd Isenberg wrote:
>>
>>>On May 01, 2005 at 05:15:57, Dieter Buerssner wrote:
>>>
>>>>For real 64 bit multiplications/devisions: How is the cycle count for these
>>                                   ^ that typo looks horrible.
>>>>instructions on AMD64? I would fear, that it needs quite a few more cycles, than
>>>>32 bit counter parts.
>>>
>  Syntax          Encoding       Decode      Latency
>                  First ModRM      type
>>>IMUL mreg16       F7h 11-101-xxx VectorPath  4
>>>IMUL mreg32/64    F7h 11-101-xxx Double         3/ 5
>>>MUL  mreg16       F7h 11-100-xxx VectorPath  4
>>>MUL  mreg32/64    F7h 11-100-xxx Double         3/ 5
>>>DIV  mreg16/32/64 F7h 11-110-xxx VectorPath 23/39/71
>>>IDIV mreg16/32/64 F7h 11-111-xxx VectorPath 26/42/74
>>
>>Gerd, thanks for positing the numbers. I am not sure, I understand the table.
>>F7h is the upcode prefix for 64 bit operations?
>
>No - the first opcode byte of mul/div with implicit ax/eax/rax dx/edx/rdx
>operands. 64-bit prefix is extra.

Ok, then it is clear. I was confused about the F7 in every instruction, because
it was rather clear that some will need a prefix ...

Multiplication looks very fast. I remember the 386 days, when I would know the
cycles for many upcodes by heart. I forgot much of this - I mean to remember
that MUL reg32 use ~40 cycles. ADD and ADC used 2 cycles. Those were also the
days, where one rather reliably could just count cycles. In a multi-precision
library for multiplication, it was almost enough to count the muls (and for
division muls and divs). The relation div/mul cycles was much better then, IIRC.
That was typical for x86 - other processors did not have div (at least no
remainder). Also, quite surprisingly, shifts and especially rot with carry
(which was useful for some multiprecision routines) were very slow on x86 then.

Sorry for the excurse from the initial subject of Steven Edwards.

Thanks for your answers,
Dieter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.