Author: Vasik Rajlich
Date: 05:00:42 01/07/04
Go up one level in this thread
On January 06, 2004 at 11:33:45, Robert Hyatt wrote: >On January 06, 2004 at 09:56:17, Uri Blass wrote: > >>On January 06, 2004 at 09:35:54, Robert Hyatt wrote: >> >>>On January 05, 2004 at 16:39:04, Uri Blass wrote: >>> >>>>I see that crafty is using swap_list[32] and I think that the array is too long. >>>> >>>>There are only 8 queen directions and 8 knight directions and I think that for >>>>practical cases swap_list[16] can make Crafty faster with no problems(in theory >>>>it is possible that 16 is not enough but I do not imagine a practical case when >>>>it can happen. >>> >>>How would it make it faster? I only use the part of the list that is >>>actually necessary. >> >>The question is if allocating memory does not cost speed. > >And the answer is _no_. :) > >Now if you use malloc() that is another issue, but here we are not using >malloc. > >If you don't know much about the X86 architecture, here is the basic idea. > >When you enter a procedure, the first thing you do is subtract some constant >from the stack pointer, where this "constant" is the amount of memory this >function needs for local variables. Now you save this address in EBP, >a special register used specifically for this purpose. Any further push >instructions will modify ESP, but EBP is left alone and will always point to >the first of 128 bytes you can use for local memory in this function. > >Lets assume that the 32 int array is all there is that is local. I'd just >subtract 128 (bytes) from ESP and copy that to EBP. Now EBP points >to the start of a 128 byte "work area" on the stack that will not be modified >by stack operations since the stack pointer is now well below these addresses >(on the PC the stack grows in the negative direction, ie 1000, then 999, >etc.) Local variables are now referenced as an offset from EBP and that is >all there is to it. No overhead whatsoever (unless you count a subtract >from ESP and then a move to EBP, which is exactly two X86 instructions. Wouldn't the work area be moved into one of the high-priority on-processor caches when the function is executed, so that bigger work area => more space taken up in this cache? (Although 16 bytes is trivial ..) I've been trying to avoid doing things like having multiple evaluation functions because more code paths => higher usage of the cache. I notice that in crafty you have a "search" and a "search of root move" - are you quite sure that this is faster than one "search" with a few conditional statements? Vas > >> >>> >>>Also, shortening the list will wreck things for those that set up oddball >>>positions. It is certainly possible to have 16 pieces for each side attacking >>>a single square, since my Swap() understands "x-ray attacks" where two pieces >>>are in battery along a rank/file/diagonal. I need 32 to handle the worst case >>>of both sides having 9 queens, two rooks, two bishops, two knights and a king >>>attacking the target square... >> >>The bishops cannot attack the same square so the worst case is 30 pieces >>It is practically even less because it is impossible to promote pawns to queens >>without some captures and only 2 pawns can attack the same square. > >Depends. We might be talking about a real game. Or we might be talking about >a bizarre position someone sets up. IE Crafty can work with N queens, where N >is _large_. Some programs just blow up, but I didn't want that to happen. >So you can cheerfully set up a position with 32+ white queens and Crafty will >cheerfully search the resulting tree with no problems. > > > > >> >> >>> >>>> >>>>I wonder if it is not better to have swap_list[16] in Crafty and add >>>>if (nc==15) break if you want to be careful not to crash maybe in 1 out of 1000 >>>>games. >>> >>>Again, I'm not sure how that would help. I only use the first few entries in >>>normal positions. Just like my move list can theoretically hold a thousand >>>moves for one side even though that can never be reached. >> >>There is a difference if you have global varaibles or local varaibles. > >Not measurable. Every function is going to have at least one local variable >for a counter or whatever. Which means that function will have to subtract 4 >from ESP and save it in EBP. There is no difference in subtracting 4 or >subtracting 4000 so the overhead there is zero. > > >> >>int swap[32] is a local varaible and I thought that the computer waste time in >>allocating memory to the local varaibles every time that you call it but maybe >>the computer does not work in that way thanks to some compiler optimiazation. >> > >See the above about how local data is allocated. "allocated" is the wrong >term as that brings up visions of "malloc()" which is wrong for local procedure >data. > > > >> >>> >>> >>> >>> >>>> >>>>It seems to me that the price of allocating memory to 16 integer is higher than >>>>the price of one if (nc==15) inside the loop. >>> >>>16 integers is 1/2 of a cache line. 32 is exactly one line, except that there >>>is no alignment forced to make the starting address exactly divisible by 128. >>> >>> >>> >>> >>>> >>>> >>>>Another point that I see is that it is using the value of the pieces and does >>>>not use piece square table. >>> >>>Correct. That will also simply confuse results. This is intended to be a >>>simple material-only estimate of "is the capture good or bad?" >>> >>> >>> >>>> >>>>I wonder if there is a reason not to use piece square table to evaluate capture >>>>of pawn in the 7th as better than capture of pawn in the second rank. >>>> >>>>Uri >>> >>>That would probably wreck the basic SEE concept. And pawns on the 7th don't >>>have much of a piece/square number for Crafty, so it really won't make any >>>difference for me. >> >>The point that I thought about is better order of captures. >>Even small difference in piece square table can generate better order of >>captures. >> >>Uri > >There is no "good or bad order". Except that you first want to use less >valuable pieces to maximize the gain of the capture. Swap() does that >well, handling the cases where pieces are behind pieces, and the first piece >in line has to be used first.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.