Author: Keith Evans
Date: 20:49:12 06/21/02
Go up one level in this thread
On June 21, 2002 at 23:22:02, Robert Hyatt wrote: >On June 21, 2002 at 22:04:09, Keith Evans wrote: > >>On June 21, 2002 at 21:21:50, Robert Hyatt wrote: >> >>>On June 21, 2002 at 17:07:54, Keith Evans wrote: >>> >>>>On June 21, 2002 at 15:03:56, Robert Hyatt wrote: >>>> >>>>>On June 20, 2002 at 21:48:10, Keith Evans wrote: >>>>> >>>>>>On June 20, 2002 at 20:56:44, Robert Hyatt wrote: >>>>>> >>>>>>>On June 20, 2002 at 14:07:50, Tom Kerrigan wrote: >>>>>>> >>>>>>>>On June 20, 2002 at 13:03:10, Robert Hyatt wrote: >>>>>>>> >>>>>>>>>It could certainly be done. However, I don't see what it would prove. >>>>>>>>>Other than that 64 bit operations are more efficient when done in one >>>>>>>>>"chunk" than in two. That seems intuitive anyway. It would also present >>>>>>>>>a few problems, with the FirstOne() and LastOne() PopCnt() functions that >>>>>>>>>use assembly on the PC but not on the 64 bit machines (yet). >>>>>>>> >>>>>>>>How would this be a problem? Why are you talking about PCs? The experiment is to >>>>>>>>force a 64-bit chip to use 32-bit ints for bitboards. The PC is not a 64-bit >>>>>>>>platform (yet) so we're OBVIOUSLY not talking about it. >>>>>>>> >>>>>>>>As for not seeing what the experiment would prove, I assume you're joking. >>>>>>>> >>>>>>>>-Tom >>>>>>> >>>>>>> >>>>>>>Not joking. When you have multiple degrees of freedom, things change and it >>>>>>>is not easy to attribute results to a specific change. Does the compiler >>>>>>>or cpu do better with a larger number of 32 bit instructions? Or better with >>>>>>>a smaller number of 64 bit operations? Do the 32 bit operations cause >>>>>>>unnecessary pipeline stalls due to things like the carry bit and whatever, >>>>>>>or do they not? Does the compiler produce as elegant a code for 32 and 64 or >>>>>>>does it do better on one or the other? When the 64 bit version runs 2x faster >>>>>>>than the 32 bit version is it because of the 64 bit advantage or because of a >>>>>>>bad 32 bit executable from the compiler? When the 64 bit runs only 5% faster >>>>>>>than the 32 bit version, same question? >>>>>> >>>>>>It sounds a little like you're being disingenuous. If you did the experiment and >>>>>>got a result like "the 64 bit runs only 5% faster than the 32 bit version" then >>>>>>would you ignore it because you're not sure why? And still tout the performance >>>>>>advantages of bitboards for 64-bit machines? >>>>>> >>>>>>Are you interested in validating the idea that bitboards are a win on 64-bit >>>>>>machines? We're just trying to propose an experiment which although imperfect >>>>>>would be more reliable than mere intuition. Any ideas? >>>>>> >>>>>>-Keith >>>>> >>>>> >>>>>So I don't trust the experiment, but if it produces results favorable to me >>>>>I would tout 64 bit programs as the cat's meow? But if it produces results >>>>>unfavorable to me I would say "the test is no good"?? >>>>> >>>>>Sorry, that isn't _me_. The test is flawed from the _beginning_. And no matter >>>>>what result it shows, it won't mean a thing. Therefore, what would be the point >>>>>unless you have a lot of time to burn and nothing to prove??? >>>> >>>>The test may be flawed, but is really it any more flawed than your method of >>>>comparing performance on a P3 to performance on a McKinley and attributing the >>>>gains in performance to the wider datapath? >>>> >>>>I have seen statements of yours ranging from: >>>> >>>>"Bitboards really don't provide anything useful as far as move generation goes, >>>>'today'.. because everything is done with 64 bit words. If you move to a 64 bit >>>>architecture, then they begin to pay off, but on 32 bit machines, they likely >>>>just 'break even.'" >>>> >>>>to: >>>> >>>>"with perfect programming, I think they should be 2x faster than offset >>>>representations, *unless* an offset generator can somehow take advantage >>>>of 64 bit words in a way that has not been done yet..." >>>> >>> >>> >>>OK... I don't see where either of those statements is wrong. The latter is >>>specifically talking about generating moves. And in a chess engine, the first >>>task it to generate only captures. Try that with your offset generator. It >>>can't be done without skipping over the empty squares in loops. Not so for >>>a bitboard generator. It generates _exactly_ the set of captures you want. >>>Want to only capture pawns? Trivial. Just as trivial as capturing any opponent >>>piece. With no "empty square" loops. >>> >>>Basically, generating captures is probably more than 2x faster with bitmaps, >>>_if_ you have a 64 bit machine. If not, it is still a bit faster overall. >>> >>>I don't attribute _all_ bitboard advantages to 64 bit processors, if you have >>>been reading _carefully_. Some of the advantages are just inherent in the >>>data structure, such as the generate captures point above. >>> >>>I've not tried to run both a 32 bit and a 64 bit version on a 64 bit processor, >>>because it seems pointless and most likely flawed for reasons already given. >>> >>>But logically, a 64 bit machine will run a 64 bit program faster because of >>>the 2-for-1 instruction reduction (or better for shifts). Some parts of my >>>program are nothing but 64 bit operations. Make/Unmake for example. Other >>>parts like GenerateMoves() is mostly 64 bit operations. It is certainly >>>logical to expect those to run faster on 64 bit processors than they run on >>>32 bit processors. Regardless of how good the _rest_ of the 64 bit processor >>>really is. >>> >>>It is hard to compare CPUs. But it is easy to compare data lengths. And >>>the 64 bit word certainly _fits_ what I am doing perfectly, for a reasonable >>>performance gain expectation. >>> >>> >>> >>>>Is it "begin to pay off" (maybe the 10% the Eugene mentions in this thread), or >>>>is it "2x faster"? I would honestly like to know. >>> >>> >>>You are mixing apples and oranges. One of the above points (2x) is >>>_specifically_ addressing the GenerateCaptures code. >>> >>> >>>> >>>>Let's say that you were one of Knuth's grad students and he was preparing a tome >>>>on chess programming. Are you going to tell him that there's no valid way to >>>>compare the 32-bit and 64-bit performance of your bitboards? >>>> >>> >>> >>>No. I would say "there is no way to compare them without doing a couple of >>>years of compiler work" however. Doable? yes. Practically doable? No. >>> >>> >>> >>>>We offered up an experiment and you shot it down. Any better ideas? I know that >>>>Mr. Corbit is content to wait at least a year for results, but I'll that if a >>>>thread titled "bitboard performance analysis" appeared tomorrow that he would >>>>click on it. >>>> >>>>-Keith >>> >>> >>>I don't have _any_ ideas. One _possible_ test is to take any 32 bit program >>>of your choice, say the old gnuchess 4 program. Run it on a 32 bit machine, >>>then recompile and run it on Mckinley. Compute the performance gain. >>> >>>Do the same for crafty on the same two machines. Compute the performance >>>gain. If Crafty doesn't gain significantly more, I will be greatly surprised. >>> >>>That is the best experiment I can suggest, and it has its own flaw, if you want >>>to point it out... >> >>Thanks for your responses. I guess that for a number of reasons we'll be seeing >>some major engine rewrites after Mckinley is widely available. And Slater will >>be running a lot of tests for people ;-) >> >>The testing flaw is pretty obvious, but we'll surely see a lot of results from >>this type of experiment posted once the Mckinleys are here. If Crafty scales >>better and influences more people to try out bitboards, then we'll have even >>more data points. And that will be interesting as I think that there will more >>correlation between a rewritten engine and its previous revision, than between >>entirely different engines. It still won't settle the strict 32 vs 64 question >>of course, but until somebody comes up with a better idea... >> >>-Keith > >Note that Fritz has started the "commercial revolution" by converting to >bitmaps. I doubt anyone thinks Frans would switch to something that is >not going to offer better performance... > >I'm waiting for some microcoding options in the processors so that we can >add some chess-specific instructions. I did this to an older VAX years ago >and had interesting results. A hardware popcnt would be nice, for example. > >:) > >And then the 64 vs 128 wars start. Because I can see advantages to 128 bit >words containing two bitboards that can be updated with one operation. For >yet another performance gain... Then the Xiangqi programmers will start arguing ;-) I would personally place my bets in the reconfigurable computing camp before I would get excited about microcode. It would be cool to see a little ARM7 with a chess coprocessor smoke an Athlon. And extra bonus points if the chess coprocessor were contained in a gameboy or Palm cartridge. -Keith
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.