Author: Mridul Muralidharan
Date: 00:04:54 08/28/05
Go up one level in this thread
Hi Gerd, I have stopped using char/short 'cos of possible stalling problems (the lower mem footprint is usually not worth it since a corresponding 'int array' usually fits into a single cache line anyway). From what limited expierence I have , I have found cache thrashing , branch misprediction and register stalls to be my performance killers. Cache thrashing is especially bad when data is not alligned :( (preferably on para boundery). Thanks for your comments - as usual enlightening ! - Mridul On August 27, 2005 at 17:40:38, Gerd Isenberg wrote: ><snip> >>>>Are you saying Gerd that: >>>> >>>> mov EAX, mem32 is faster than mov AL,mem8 ? >>> >>>Yes, slightly - accordind to the optimization manual three (not 1!) cycles >>>instead of four (both in 32-bit as well in 64-bit mode): >> >>Ok. >> >> >>>MOV reg8, mem8 8Ah mm-xxx-xxx DirectPath 4 >>>MOV reg16, mem16 8Bh mm-xxx-xxx DirectPath 4 >>>MOV reg32/64, mem32/64 8Bh mm-xxx-xxx DirectPath 3 >> >>So why are chess engines still using 8-bit boards and tables? >> >>He he he.... >> > >hmm... i have to relativate it a bit, also the answer to Mridul as well. > >Most simple arithmetical and bitwise instuctions have four cycles for both 8-bit >and 16/32/64-bit instructions. > >ADD reg8, mem8 02h mm-xxx-xxx DirectPath 4 >ADD reg16/32/64, mem16/32/64 03h mm-xxx-xxx DirectPath 4 > >CMP reg8, mem8 3Ah mm-xxx-xxx DirectPath 4 >CMP reg16/32/64, mem16/32/64 3Bh mm-xxx-xxx DirectPath 4 > >But see 2.23 32-Bit Integral Data Types in the manual... >Alignment and stalling issues are probably more important. > > ><snip> >>>>>Also, avoid the shorter but redundant EAX-Move encoding: >>>>> >>>>>MOV AX/EAX/RAX, mem16/32/64 A1h DirectPath 4/3/3 >>>> >>>>Right, never us it. >>> >>>Nope, A1h mem16/32/64 move has the same latency (4/3/3) than the one byte longer >>>8Bh opcode for all gp-registers. Sorry for confusing. Anyway it is usually the >>>choice of the assembler or compiler, unless you code directly in machine >>>language ;-) >> >>So it has been fixed after all, not that I see much practical use. > >The old 8080 accu. Still a privileged register with some shorter opcodes here >and there. > >Cheers, >Gerd > >> >>Thanks Gerd. >> >>Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.