Author: Robert Hyatt
Date: 18:21:50 06/21/02
Go up one level in this thread
On June 21, 2002 at 17:07:54, Keith Evans wrote: >On June 21, 2002 at 15:03:56, Robert Hyatt wrote: > >>On June 20, 2002 at 21:48:10, Keith Evans wrote: >> >>>On June 20, 2002 at 20:56:44, Robert Hyatt wrote: >>> >>>>On June 20, 2002 at 14:07:50, Tom Kerrigan wrote: >>>> >>>>>On June 20, 2002 at 13:03:10, Robert Hyatt wrote: >>>>> >>>>>>It could certainly be done. However, I don't see what it would prove. >>>>>>Other than that 64 bit operations are more efficient when done in one >>>>>>"chunk" than in two. That seems intuitive anyway. It would also present >>>>>>a few problems, with the FirstOne() and LastOne() PopCnt() functions that >>>>>>use assembly on the PC but not on the 64 bit machines (yet). >>>>> >>>>>How would this be a problem? Why are you talking about PCs? The experiment is to >>>>>force a 64-bit chip to use 32-bit ints for bitboards. The PC is not a 64-bit >>>>>platform (yet) so we're OBVIOUSLY not talking about it. >>>>> >>>>>As for not seeing what the experiment would prove, I assume you're joking. >>>>> >>>>>-Tom >>>> >>>> >>>>Not joking. When you have multiple degrees of freedom, things change and it >>>>is not easy to attribute results to a specific change. Does the compiler >>>>or cpu do better with a larger number of 32 bit instructions? Or better with >>>>a smaller number of 64 bit operations? Do the 32 bit operations cause >>>>unnecessary pipeline stalls due to things like the carry bit and whatever, >>>>or do they not? Does the compiler produce as elegant a code for 32 and 64 or >>>>does it do better on one or the other? When the 64 bit version runs 2x faster >>>>than the 32 bit version is it because of the 64 bit advantage or because of a >>>>bad 32 bit executable from the compiler? When the 64 bit runs only 5% faster >>>>than the 32 bit version, same question? >>> >>>It sounds a little like you're being disingenuous. If you did the experiment and >>>got a result like "the 64 bit runs only 5% faster than the 32 bit version" then >>>would you ignore it because you're not sure why? And still tout the performance >>>advantages of bitboards for 64-bit machines? >>> >>>Are you interested in validating the idea that bitboards are a win on 64-bit >>>machines? We're just trying to propose an experiment which although imperfect >>>would be more reliable than mere intuition. Any ideas? >>> >>>-Keith >> >> >>So I don't trust the experiment, but if it produces results favorable to me >>I would tout 64 bit programs as the cat's meow? But if it produces results >>unfavorable to me I would say "the test is no good"?? >> >>Sorry, that isn't _me_. The test is flawed from the _beginning_. And no matter >>what result it shows, it won't mean a thing. Therefore, what would be the point >>unless you have a lot of time to burn and nothing to prove??? > >The test may be flawed, but is really it any more flawed than your method of >comparing performance on a P3 to performance on a McKinley and attributing the >gains in performance to the wider datapath? > >I have seen statements of yours ranging from: > >"Bitboards really don't provide anything useful as far as move generation goes, >'today'.. because everything is done with 64 bit words. If you move to a 64 bit >architecture, then they begin to pay off, but on 32 bit machines, they likely >just 'break even.'" > >to: > >"with perfect programming, I think they should be 2x faster than offset >representations, *unless* an offset generator can somehow take advantage >of 64 bit words in a way that has not been done yet..." > OK... I don't see where either of those statements is wrong. The latter is specifically talking about generating moves. And in a chess engine, the first task it to generate only captures. Try that with your offset generator. It can't be done without skipping over the empty squares in loops. Not so for a bitboard generator. It generates _exactly_ the set of captures you want. Want to only capture pawns? Trivial. Just as trivial as capturing any opponent piece. With no "empty square" loops. Basically, generating captures is probably more than 2x faster with bitmaps, _if_ you have a 64 bit machine. If not, it is still a bit faster overall. I don't attribute _all_ bitboard advantages to 64 bit processors, if you have been reading _carefully_. Some of the advantages are just inherent in the data structure, such as the generate captures point above. I've not tried to run both a 32 bit and a 64 bit version on a 64 bit processor, because it seems pointless and most likely flawed for reasons already given. But logically, a 64 bit machine will run a 64 bit program faster because of the 2-for-1 instruction reduction (or better for shifts). Some parts of my program are nothing but 64 bit operations. Make/Unmake for example. Other parts like GenerateMoves() is mostly 64 bit operations. It is certainly logical to expect those to run faster on 64 bit processors than they run on 32 bit processors. Regardless of how good the _rest_ of the 64 bit processor really is. It is hard to compare CPUs. But it is easy to compare data lengths. And the 64 bit word certainly _fits_ what I am doing perfectly, for a reasonable performance gain expectation. >Is it "begin to pay off" (maybe the 10% the Eugene mentions in this thread), or >is it "2x faster"? I would honestly like to know. You are mixing apples and oranges. One of the above points (2x) is _specifically_ addressing the GenerateCaptures code. > >Let's say that you were one of Knuth's grad students and he was preparing a tome >on chess programming. Are you going to tell him that there's no valid way to >compare the 32-bit and 64-bit performance of your bitboards? > No. I would say "there is no way to compare them without doing a couple of years of compiler work" however. Doable? yes. Practically doable? No. >We offered up an experiment and you shot it down. Any better ideas? I know that >Mr. Corbit is content to wait at least a year for results, but I'll that if a >thread titled "bitboard performance analysis" appeared tomorrow that he would >click on it. > >-Keith I don't have _any_ ideas. One _possible_ test is to take any 32 bit program of your choice, say the old gnuchess 4 program. Run it on a 32 bit machine, then recompile and run it on Mckinley. Compute the performance gain. Do the same for crafty on the same two machines. Compute the performance gain. If Crafty doesn't gain significantly more, I will be greatly surprised. That is the best experiment I can suggest, and it has its own flaw, if you want to point it out...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.