Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Expert Assembler Question

Author: Mridul Muralidharan

Date: 00:04:54 08/28/05

Go up one level in this thread


Hi Gerd,

  I have stopped using char/short 'cos of possible stalling problems (the lower
mem footprint is usually not worth it since a corresponding 'int array' usually
fits into a single cache line anyway).
From what limited expierence I have , I have found cache thrashing , branch
misprediction and register stalls to be my performance killers.
Cache thrashing is especially bad when data is not alligned :( (preferably on
para boundery).

Thanks for your comments - as usual enlightening !
- Mridul

On August 27, 2005 at 17:40:38, Gerd Isenberg wrote:

><snip>
>>>>Are you saying Gerd that:
>>>>
>>>> mov EAX, mem32 is faster than mov AL,mem8 ?
>>>
>>>Yes, slightly - accordind to the optimization manual three (not 1!) cycles
>>>instead of four (both in 32-bit as well in 64-bit mode):
>>
>>Ok.
>>
>>
>>>MOV reg8, mem8 8Ah mm-xxx-xxx DirectPath 4
>>>MOV reg16, mem16 8Bh mm-xxx-xxx DirectPath 4
>>>MOV reg32/64, mem32/64 8Bh mm-xxx-xxx DirectPath 3
>>
>>So why are chess engines still using 8-bit boards and tables?
>>
>>He he he....
>>
>
>hmm... i have to relativate it a bit, also the answer to Mridul as well.
>
>Most simple arithmetical and bitwise instuctions have four cycles for both 8-bit
>and 16/32/64-bit instructions.
>
>ADD reg8, mem8 02h mm-xxx-xxx DirectPath 4
>ADD reg16/32/64, mem16/32/64 03h mm-xxx-xxx DirectPath 4
>
>CMP reg8, mem8 3Ah mm-xxx-xxx DirectPath 4
>CMP reg16/32/64, mem16/32/64 3Bh mm-xxx-xxx DirectPath 4
>
>But see 2.23 32-Bit Integral Data Types in the manual...
>Alignment and stalling issues are probably more important.
>
>
><snip>
>>>>>Also, avoid the shorter but redundant EAX-Move encoding:
>>>>>
>>>>>MOV AX/EAX/RAX, mem16/32/64 A1h DirectPath 4/3/3
>>>>
>>>>Right, never us it.
>>>
>>>Nope, A1h mem16/32/64 move has the same latency (4/3/3) than the one byte longer
>>>8Bh opcode for all gp-registers. Sorry for confusing. Anyway it is usually the
>>>choice of the assembler or compiler, unless you code directly in machine
>>>language ;-)
>>
>>So it has been fixed after all, not that I see much practical use.
>
>The old 8080 accu. Still a privileged register with some shorter opcodes here
>and there.
>
>Cheers,
>Gerd
>
>>
>>Thanks Gerd.
>>
>>Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.