Author: Gerd Isenberg
Date: 01:01:50 08/02/04
Go up one level in this thread
<snip> > >Hi Gerd, > >Thanks for the very useful comments. > >The exceptions you mentioned are important but very rare (<1% in practical >play). I have already thought on similar problems before. The fact is that the >most of the draw-repetition positions are generated by the check sequences with >chains consist of even number of moves, or by moving different pieces. But, >anyway, we will try to find elegant solution for those exceptions > >Your details about the LOOP command are very interesting, I think I have written >thousands of LOOP commands in Axon! > > >best regards, > >Vladan > >(Axon programmer) Hi Vladan, yes, for qsearch one may safely ignore such rare position repetitions. They are not likely to happen in perpetual checks. The problem with Athlon's LOOP instruction is that it is implemented as vector path instruction blocking other otherwise parallel available resources. Athlon32 LOOP disp8 E2h VectorPath 8 DEC CX/ECX 49h DirectPath 1 JNE/JNZ disp8 75h DirectPath 1 AMD64 LOOP disp8 E2h VectorPath 9/8 The first latency value (9!) is for 32-bit mode. The second is for 64-bit mode. DEC ECX 49h DirectPath 1 JNZ/JNE short disp8 75h DirectPath 1 Another possible optimization is about rep stosw. ---------------------------------------------------------------------------- Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors 8.3 Repeated String Instructions Optimization Avoid using the REP prefix when performing string operations, especially when copying blocks of memory. Rational In general, using the REP prefix to repeatedly perform string instructions is less optimal than other methods, especially when copying blocks of memory. For a discussion of alternate memory-copy methods, see “Appropriate Memory Copying Routines” on page 112. ... Inline REP String with Low Counts If the repeat count is constant and low (less than eight), expand REP string instructions into equivalent sequences of simple AMD64 instructions. Use an inline sequence of loads and stores to accomplish the move. Use a sequence of stores to emulate rep stos. This technique eliminates the setup overhead of REP instructions and increases instruction throughput. ---------------------------------------------------------------------------- E.g. one may use six mmx-stores to zero 48 bytes (should be 8-byte aligned): lea eax, [chain_list] pxor mm0, mm0 ; zero movq [eax+0*8], mm0 movq [eax+1*8], mm0 movq [eax+2*8], mm0 movq [eax+3*8], mm0 movq [eax+4*8], mm0 movq [eax+5*8], mm0 Cheers, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.