Author: Bas Hamstra
Date: 18:09:13 11/19/99
Go up one level in this thread
On November 19, 1999 at 11:37:50, Ratko V Tomic wrote: >>> instruction (which are more expensive than straight fall through). >>> In assembly language one can easily code this so that >>> (piece,source_square) pair of values retreives a pointer to a code >>> label which has the right number of directions and steps each direction >>> coded to execute without jumps or loops and to change the direction >>> (on default end of direction), all in a single block of a fall through >> >code. >> >> I think you can as well do this in C with the good old goto? > >C doesn't have a FORTRAN style "computed goto." The closest thing it has is a >switch statement, but its shortcomings vs the assembly labguage are: >1) You have to code each "case" as separate statement. What do you do if you >have, say, 12*64=768 labels (or even more than 12 piece types if you add castle >capable & ep subtypes)? Type several thousands by hand? In assembler you make an >iterating (by count) macros of the handful basic types (12 or so) and the >dispatch table which allows jumping into the right place inside the repeated >macro block which ahs no explicit labels. Good point. Like Dan I wrote a metaprogram :) IE a codegenerator. But it took a lot of cutting and pasting too and the results were somewhat disappointing. My incremental pseudo-attacks are updated like this: XORQueen() switch(Square) { case 11: asm XOR [ESI+1], EBX asm XOR [ESI+2], EBX . . . case 12: } It is the fastest way I can think of, but you seem more like an assembly expert than me. >2) The switch statement will at best (even if you somehow type in [using C >macros and editor's cut&paste] few hundreds or thousands cases) perform doubly >indirect jump to the case statement, i.e. a (square,piece) pair will first >retrieve a number which would be used as an index into the array (built >implicitly by the compiler, without guarantee it will do so at all, and without >any control from you) of code addresses for different case statements (provided >it doesn't overflow its "switch" statement compiling buffers on switch with few >thousands case statements). Our assembly language table is indexed directly by >the piece type and square (fetched as two adjecent bytes without shifting the >bit patterns togehter) addressing thus a table of 12*256 = 3072 entries, each >being a direct address of a place in the repeated macro block where to jump; the >whole switchover (including termination at the end of the list) is then: > > LODSW ;get next (square,piece) pair of bytes > JMP [EBX+EAX*4] ;dispatch to "this piece on this square" routine > ;ebx has an address of the 3072 dword table > >If square can be represented (which it can) in the piece-square array as a >4*square (i.e. as 0, 4, 8, 12,..) then the dispatch table can be reduced to the >3072 byte table (instead of 3072 dwords) and the jump above could be of the >quicker form JMP [EBX+EAX] (i.e. without index scaling opcodes). >3) This large C switch statement, will produce code which checks the range of >the switch argument, to verify it falls within the dispatch array boundaries >(since this argument is a variable, picked out of an array, via computed index >itself picked from another array) and compiler doesn't know its inherent limits. >While on some compilers (of some versions) you can use pragmas to help it avoid >some checking, the code loses portability (e.g. on the same platform, say >Intel's processor, an assembly routine above would be portable between compilers >or versions of compilers, as a linked OBJ module or DLL, while the C code with >version specific pragmas would not be.) > >4) The register variable allocation across such large switch statement, with >fall through and branching subcases, would suffer in the compiled code. The >assembly code could have a well planned policy on register usage (especially the >byte registers, which are more numerous) based on knowing which data matter, >which says the same and which can change. > >5) If one needs to separate captures from non-captures (as one normally does), >the move generator is the most natural and economical place to do it. In that >case the assembly code wouldn't need the two pointers for the two output lists >(thus wasting an extra register variable, assuming it can even get to the point >where it would pick these pointers for register variables), but could push one >type of the moves on the stack (using single instruction PUSH ECX) and the other >type into an array (via single isntruction STOSW or STOSD). They could be >processed there (on the stack), if stack was temporarily redirected into the >right array, or in the worst case, they could be bulk-copied (which is quick) to >the final destinaton, before the stack gets reset. Again like Dan, I use 2 separate functions to generate caps and noncaps. So this argument isn't entirely convincing to me. >6) Since the entire "dispatch to the next piece" assembly code is quite short, 2 > assembly instructions, one could place dispatch to next at the end of each >(piece,square) routine, reducing the extra jump to common dispatch point for >each (piece,square) item. C switch can't do that at all (i.e. one would need to >replicate the entire switch inside each "case" statement just to avoid this >extra jump for one (piece,square) item from the list, then after the 2nd item it >would have to take the extra jump to the top level switch). > > >Overall, for this kind of task, a good assembly code would be 4-5 times faster >than the best C code/compiler could do, it would be much easier to code (less >repetitive and error prone, due to the hundreds manually entered "case" labels >in C) and it would be more portable across the compilers and languages on the >same CPU type (like the dominant Intel CPUs, with the dominant Win32 DLL API). On the whole very instructive and convincing. I thought there wasn't much you could do in asm that you couldn't in C. But apparently there are some neat things. Not that I am going to rewrite much in asm. I'd like to but argh, no time... >> Anyway I finally became convinced that the importancy of the speed of >> the movegenerator is modest, unless you want to make a Fritz. > >Well, it could benefit in another extreme, too, at the opposite end of the >spectrum from Fritz. E.g. a design where a program uses two levels of search >tree (similar to Botvinik's Pioneer scheme), one high level tree guided by the >chess knowledge and plans, which is a sparse & deep human-like tree, and the >numerous other low level trees produced by a fast alpha-beta searcher to "hop," >as it were, between the nodes of the high level tree (the low level seracher is >additionally constrained in each such "bridging" search by the specific >goals/targets set by the higher level code for that high level node serving as >the root of that low level search). Nah :) Don't believe in that. Let's face it, brute force has won the race. And what you are describing has holes. Still interesting, of course. Regards, Bas Hamstra.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.