Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: about SEE of Crafty

Author: Robert Hyatt
Date: 06:47:12 01/07/04
On January 07, 2004 at 08:00:42, Vasik Rajlich wrote:

>On January 06, 2004 at 11:33:45, Robert Hyatt wrote:
>
>>On January 06, 2004 at 09:56:17, Uri Blass wrote:
>>
>>>On January 06, 2004 at 09:35:54, Robert Hyatt wrote:
>>>
>>>>On January 05, 2004 at 16:39:04, Uri Blass wrote:
>>>>
>>>>>I see that crafty is using swap_list[32] and I think that the array is too long.
>>>>>
>>>>>There are only 8 queen directions and 8 knight directions and I think that for
>>>>>practical cases swap_list[16] can make Crafty faster with no problems(in theory
>>>>>it is possible that 16 is not enough but I do not imagine a practical case when
>>>>>it can happen.
>>>>
>>>>How would it make it faster?  I only use the part of the list that is
>>>>actually necessary.
>>>
>>>The question is if allocating memory does not cost speed.
>>
>>And the answer is _no_.  :)
>>
>>Now if you use malloc() that is another issue, but here we are not using
>>malloc.
>>
>>If you don't know much about the X86 architecture, here is the basic idea.
>>
>>When you enter a procedure, the first thing you do is subtract some constant
>>from the stack pointer, where this "constant" is the amount of memory this
>>function needs for local variables.  Now you save this address in EBP,
>>a special register used specifically for this purpose.  Any further push
>>instructions will modify ESP, but EBP is left alone and will always point to
>>the first of 128 bytes you can use for local memory in this function.
>>
>>Lets assume that the 32 int array is all there is that is local.  I'd just
>>subtract 128 (bytes) from ESP and copy that to EBP.  Now EBP points
>>to the start of a 128 byte "work area" on the stack that will not be modified
>>by stack operations since the stack pointer is now well below these addresses
>>(on the PC the stack grows in the negative direction, ie 1000, then 999,
>>etc.)  Local variables are now referenced as an offset from EBP and that is
>>all there is to it.  No overhead whatsoever (unless you count a subtract
>>from ESP and then a move to EBP, which is exactly two X86 instructions.
>
>Wouldn't the work area be moved into one of the high-priority on-processor
>caches when the function is executed, so that bigger work area => more space
>taken up in this cache? (Although 16 bytes is trivial ..)
>
>I've been trying to avoid doing things like having multiple evaluation functions
>because more code paths => higher usage of the cache. I notice that in crafty
>you have a "search" and a "search of root move" - are you quite sure that this
>is faster than one "search" with a few conditional statements?
>
>Vas

Speed wasn't the issue.  Readability was.  Of course, it is also faster,
since doing conditional tests at every node hurts performance...

Branches are bad news on long-pipe machines like the PIV...

However, for Crafty, it simply makes the code easier to read.  No hash
probe at the root, as it makes no sense.  No worrying about several things
that are done at the root but not elsewhere, and vice-versa...



>
>>
>>>
>>>>
>>>>Also, shortening the list will wreck things for those that set up oddball
>>>>positions.  It is certainly possible to have 16 pieces for each side attacking
>>>>a single square, since my Swap() understands "x-ray attacks" where two pieces
>>>>are in battery along a rank/file/diagonal.  I need 32 to handle the worst case
>>>>of both sides having 9 queens, two rooks, two bishops, two knights and a king
>>>>attacking the target square...
>>>
>>>The bishops cannot attack the same square so the worst case is 30 pieces
>>>It is practically even less because it is impossible to promote pawns to queens
>>>without some captures and only 2 pawns can attack the same square.
>>
>>Depends.  We might be talking about a real game.  Or we might be talking about
>>a bizarre position someone sets up.  IE Crafty can work with N queens, where N
>>is _large_.  Some programs just blow up, but I didn't want that to happen.
>>So you can cheerfully set up a position with 32+ white queens and Crafty will
>>cheerfully search the resulting tree with no problems.
>>
>>
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>>I wonder if it is not better to have swap_list[16] in Crafty and add
>>>>>if (nc==15) break if you want to be careful not to crash maybe in 1 out of 1000
>>>>>games.
>>>>
>>>>Again, I'm not sure how that would help.  I only use the first few entries in
>>>>normal positions.  Just like my move list can theoretically hold a thousand
>>>>moves for one side even though that can never be reached.
>>>
>>>There is a difference if you have global varaibles or local varaibles.
>>
>>Not measurable.  Every function is going to have at least one local variable
>>for a counter or whatever.  Which means that function will have to subtract 4
>>from ESP and save it in EBP.  There is no difference in subtracting 4 or
>>subtracting 4000 so the overhead there is zero.
>>
>>
>>>
>>>int swap[32] is a local varaible and I thought that the computer waste time in
>>>allocating memory to the local varaibles every time that you call it but maybe
>>>the computer does not work in that way thanks to some compiler optimiazation.
>>>
>>
>>See the above about how local data is allocated.  "allocated" is the wrong
>>term as that brings up visions of "malloc()" which is wrong for local procedure
>>data.
>>
>>
>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>It seems to me that the price of allocating memory to 16 integer is higher than
>>>>>the price of one if (nc==15) inside the loop.
>>>>
>>>>16 integers is 1/2 of a cache line.  32 is exactly one line, except that there
>>>>is no alignment forced to make the starting address exactly divisible by 128.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>Another point that I see is that it is using the value of the pieces and does
>>>>>not use piece square table.
>>>>
>>>>Correct.  That will also simply confuse results.  This is intended to be a
>>>>simple material-only estimate of "is the capture good or bad?"
>>>>
>>>>
>>>>
>>>>>
>>>>>I wonder if there is a reason not to use piece square table to evaluate capture
>>>>>of pawn in the 7th as better than capture of pawn in the second rank.
>>>>>
>>>>>Uri
>>>>
>>>>That would probably wreck the basic SEE concept.  And pawns on the 7th don't
>>>>have much of a piece/square number for Crafty, so it really won't make any
>>>>difference for me.
>>>
>>>The point that I thought about is better order of captures.
>>>Even small difference in piece square table can generate better order of
>>>captures.
>>>
>>>Uri
>>
>>There is no "good or bad order".  Except that you first want to use less
>>valuable pieces to maximize the gain of the capture.  Swap() does that
>>well, handling the cases where pieces are behind pieces, and the first piece
>>in line has to be used first.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.