Author: Robert Hyatt
Date: 19:30:40 11/27/97
Go up one level in this thread
On November 27, 1997 at 10:30:29, Chris Whittington wrote: > > >So, hacking your numbers around, you appear to suggest: > >1. 32-bit P6 to 64 bit alpha architecture change gives Crafty about 80% > I'm not sure. All I can say for sure is Bruce gets 1.75X on the 533 machine, I get 3.1X on the 500 machine. If we assume that 1.75 is pure machine cycle time, ignoring 32 vs 64 bits, then the rest is likely a result of simply using 64 bit words effectively. >2. With some care and optimisation, you reckon you could get this to, >say 125% I believe so, yes. IE I know of a faster PopCnt() function for the alpha, for example. And as this turns into a hardware operation (which is rumored for the 21264) there is a lot to gain, yet... > >But this doesn't imply 64 bitmap approach for chess is approx 100% >faster than 32 bit for chess, does it ? Because the 32 bit Crafty code >is an emulation of 64 bitness bitmaps, not code that is written in the >'previous' style, of say having a byte or a word or whatever per board >square. I personally think that algorithm for algorithm, offset and bitmap are equal. I have to pump 2 32bit words thru the P6 for everything I do, but I get some efficiencies that make up for this. But on the alpha, you are *forced* to pump 64bit quantities around, and if you are forced to do this, it obviously pays off to use the full 64bits if possible, to avoid pumping values around that have double the number of bits you really need. I have this vague feeling that bitmaps are going to be superior on 64bit machines. Based on the problem of pumping/using 64bits, vs pumping 64 but using only 32, which means 1/2 of the data throughput is wasted, totally. > >The next interesting question, and one that is again semi-impossible to >answer, is just how much better is it (as measured by nps) to be using >bitmaps than not. Looks like the upper bound on 'better' is something >less than your 125%. One wonders how much less .... ? The question is "how many instructions per node?" I suspect. And based on my analysis, this answer is potentially lower for bitmaps. IE, for a simple example, what is involved in your asking "is the pawn outside the square of the king?" I do it in one memory load for a mask plus an AND operation and a test. IE it takes so little time I can't measure it. Many things turn out to be as efficient, on a 64 bit machine. On a 32 bit machine, I take a hit because everything is done twice. This is not bad on a superscalar machine like the P5/P6, because the inefficient double stuff simply serves to feed multiple pipes in parallel. So I keep the pipes better fed than most programs, because of this. But it isn't optimal. On the 64 bit machines, I lose the duplicate operations, which gives me an almost free 2x performance boost. But not using them isn't a kiss of death. But it might end up being a disadvantage until someone finds a new approach that is non-bitmap based, but which needs to pump 64 bit values around. > >Also of consequence is that these figures are dependant on working to >the Crafty-design paradigm. Loosely, I seem to recollect that you (or >was it a Bruce description of Ferret?) create piece movement bitmaps, >and for sliding pieces, these bitmaps don't account for blocking pieces. >Like, for example, if you want to know if a piece is attacked, you pick >up the enemy movement bitmaps, test for a hit on the square in question, >and, if its hit by a sliding piece, THEN test for a blocking piece. This >seems fine, good and fast, if your program is not needing to do many >piece attack queries; becuase if it is, there comes the point where it >is probably better to preprocess all of these beforehand, and then not >have to waste time over and over in doing block piece tests ........ ? first, the "blocking" test doesn't apply. I take a diagonal's contents, index into an array, and get a bitmap showing what squares this "piece" attacks on this diagonal. One memory reference and I am totally done. If I am interested in "battery" conditions and so forth, it is not hard. IE I haven't found anything I wanted to do that wasn't just as easy (or even easier) than what I did in Cray Blitz, which was *not* bitmap based at all. If I used each piece's attacks repeatedly, I'd likely avoid all the duplicate work, as a standard memory-reference optimization, because all of our machines are memory-starved and are getting worse every year... However, the bottom line is I don't think there are any things you can do that I can't, nor any things I can do that you can't. The ideas are somewhat complimentary. In fact, I have a non-bitmap board representation in crafty (board[64]) that I use for some specific things that are easier to do than trying to do them with bitmaps... >Or, what I'm trying to say is that the Crafty results probably don't >cover highly intensive evaluation functions ..... ? I can't answer. IE I do mobility at zero cost. I do the pawn square thing at zero cost. It takes some thought to get into bitmaps, but I don't see anything I did before that I can't do now, nor do I see anything that I do now that I couldn't have done before. If performance is the issue, I suspect bitmaps on 64 bit machines are going to be very hard to beat. But if you are interested only in knowledge, I don't see why they wouldn't work just as well as the traditional offset board representation, and they still should give you a performance "kick"... But notice that it takes a lot of time to get "into" bitmaps and start thinking bitmaps. And after that point is reached, they are no more obtuse or difficult to use than any other programming paradigm. And using macros can produce some pretty readable code and hide all the ands/ors/ etc...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.