Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64 bits

Author: Keith Evans

Date: 20:49:12 06/21/02

Go up one level in this thread


On June 21, 2002 at 23:22:02, Robert Hyatt wrote:

>On June 21, 2002 at 22:04:09, Keith Evans wrote:
>
>>On June 21, 2002 at 21:21:50, Robert Hyatt wrote:
>>
>>>On June 21, 2002 at 17:07:54, Keith Evans wrote:
>>>
>>>>On June 21, 2002 at 15:03:56, Robert Hyatt wrote:
>>>>
>>>>>On June 20, 2002 at 21:48:10, Keith Evans wrote:
>>>>>
>>>>>>On June 20, 2002 at 20:56:44, Robert Hyatt wrote:
>>>>>>
>>>>>>>On June 20, 2002 at 14:07:50, Tom Kerrigan wrote:
>>>>>>>
>>>>>>>>On June 20, 2002 at 13:03:10, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>It could certainly be done.  However, I don't see what it would prove.
>>>>>>>>>Other than that 64 bit operations are more efficient when done in one
>>>>>>>>>"chunk" than in two.  That seems intuitive anyway.  It would also present
>>>>>>>>>a few problems, with the FirstOne() and LastOne() PopCnt() functions that
>>>>>>>>>use assembly on the PC but not on the 64 bit machines (yet).
>>>>>>>>
>>>>>>>>How would this be a problem? Why are you talking about PCs? The experiment is to
>>>>>>>>force a 64-bit chip to use 32-bit ints for bitboards. The PC is not a 64-bit
>>>>>>>>platform (yet) so we're OBVIOUSLY not talking about it.
>>>>>>>>
>>>>>>>>As for not seeing what the experiment would prove, I assume you're joking.
>>>>>>>>
>>>>>>>>-Tom
>>>>>>>
>>>>>>>
>>>>>>>Not joking.  When you have multiple degrees of freedom, things change and it
>>>>>>>is not easy to attribute results to a specific change.  Does the compiler
>>>>>>>or cpu do better with a larger number of 32 bit instructions?  Or better with
>>>>>>>a smaller number of 64 bit operations?  Do the 32 bit operations cause
>>>>>>>unnecessary pipeline stalls due to things like the carry bit and whatever,
>>>>>>>or do they not?  Does the compiler produce as elegant a code for 32 and 64 or
>>>>>>>does it do better on one or the other?  When the 64 bit version runs 2x faster
>>>>>>>than the 32 bit version is it because of the 64 bit advantage or because of a
>>>>>>>bad 32 bit executable from the compiler?  When the 64 bit runs only 5% faster
>>>>>>>than the 32 bit version, same question?
>>>>>>
>>>>>>It sounds a little like you're being disingenuous. If you did the experiment and
>>>>>>got a result like "the 64 bit runs only 5% faster than the 32 bit version" then
>>>>>>would you ignore it because you're not sure why? And still tout the performance
>>>>>>advantages of bitboards for 64-bit machines?
>>>>>>
>>>>>>Are you interested in validating the idea that bitboards are a win on 64-bit
>>>>>>machines? We're just trying to propose an experiment which although imperfect
>>>>>>would be more reliable than mere intuition. Any ideas?
>>>>>>
>>>>>>-Keith
>>>>>
>>>>>
>>>>>So I don't trust the experiment, but if it produces results favorable to me
>>>>>I would tout 64 bit programs as the cat's meow?  But if it produces results
>>>>>unfavorable to me I would say "the test is no good"??
>>>>>
>>>>>Sorry, that isn't _me_.  The test is flawed from the _beginning_.  And no matter
>>>>>what result it shows, it won't mean a thing.  Therefore, what would be the point
>>>>>unless you have a lot of time to burn and nothing to prove???
>>>>
>>>>The test may be flawed, but is really it any more flawed than your method of
>>>>comparing performance on a P3 to performance on a McKinley and attributing the
>>>>gains in performance to the wider datapath?
>>>>
>>>>I have seen statements of yours ranging from:
>>>>
>>>>"Bitboards really don't provide anything useful as far as move generation goes,
>>>>'today'.. because everything is done with 64 bit words. If you move to a 64 bit
>>>>architecture, then they begin to pay off, but on 32 bit machines, they likely
>>>>just 'break even.'"
>>>>
>>>>to:
>>>>
>>>>"with perfect programming, I think they should be 2x faster than offset
>>>>representations, *unless* an offset generator can somehow take advantage
>>>>of 64 bit words in a way that has not been done yet..."
>>>>
>>>
>>>
>>>OK... I don't see where either of those statements is wrong.  The latter is
>>>specifically talking about generating moves.  And in a chess engine, the first
>>>task it to generate only captures.  Try that with your offset generator.  It
>>>can't be done without skipping over the empty squares in loops.  Not so for
>>>a bitboard generator.  It generates _exactly_ the set of captures you want.
>>>Want to only capture pawns?  Trivial.  Just as trivial as capturing any opponent
>>>piece.  With no "empty square" loops.
>>>
>>>Basically, generating captures is probably more than 2x faster with bitmaps,
>>>_if_ you have a 64 bit machine.  If not, it is still a bit faster overall.
>>>
>>>I don't attribute _all_ bitboard advantages to 64 bit processors, if you have
>>>been reading _carefully_.  Some of the advantages are just inherent in the
>>>data structure, such as the generate captures point above.
>>>
>>>I've not tried to run both a 32 bit and a 64 bit version on a 64 bit processor,
>>>because it seems pointless and most likely flawed for reasons already given.
>>>
>>>But logically, a 64 bit machine will run a 64 bit program faster because of
>>>the 2-for-1 instruction reduction (or better for shifts).  Some parts of my
>>>program are nothing but 64 bit operations.  Make/Unmake for example.  Other
>>>parts like GenerateMoves() is mostly 64 bit operations.  It is certainly
>>>logical to expect those to run faster on 64 bit processors than they run on
>>>32 bit processors.  Regardless of how good the _rest_ of the 64 bit processor
>>>really is.
>>>
>>>It is hard to compare CPUs.  But it is easy to compare data lengths.  And
>>>the 64 bit word certainly _fits_ what I am doing perfectly, for a reasonable
>>>performance gain expectation.
>>>
>>>
>>>
>>>>Is it "begin to pay off" (maybe the 10% the Eugene mentions in this thread), or
>>>>is it "2x faster"? I would honestly like to know.
>>>
>>>
>>>You are mixing apples and oranges.  One of the above points (2x) is
>>>_specifically_ addressing the GenerateCaptures code.
>>>
>>>
>>>>
>>>>Let's say that you were one of Knuth's grad students and he was preparing a tome
>>>>on chess programming. Are you going to tell him that there's no valid way to
>>>>compare the 32-bit and 64-bit performance of your bitboards?
>>>>
>>>
>>>
>>>No.  I would say "there is no way to compare them without doing a couple of
>>>years of compiler work" however.  Doable?  yes.  Practically doable?  No.
>>>
>>>
>>>
>>>>We offered up an experiment and you shot it down. Any better ideas? I know that
>>>>Mr. Corbit is content to wait at least a year for results, but I'll that if a
>>>>thread titled "bitboard performance analysis" appeared tomorrow that he would
>>>>click on it.
>>>>
>>>>-Keith
>>>
>>>
>>>I don't have _any_ ideas.  One _possible_ test is to take any 32 bit program
>>>of your choice, say the old gnuchess 4 program.  Run it on a 32 bit machine,
>>>then recompile and run it on Mckinley.  Compute the performance gain.
>>>
>>>Do the same for crafty on the same two machines.  Compute the performance
>>>gain.  If Crafty doesn't gain significantly more, I will be greatly surprised.
>>>
>>>That is the best experiment I can suggest, and it has its own flaw, if you want
>>>to point it out...
>>
>>Thanks for your responses. I guess that for a number of reasons we'll be seeing
>>some major engine rewrites after Mckinley is widely available. And Slater will
>>be running a lot of tests for people ;-)
>>
>>The testing flaw is pretty obvious, but we'll surely see a lot of results from
>>this type of experiment posted once the Mckinleys are here. If Crafty scales
>>better and influences more people to try out bitboards, then we'll have even
>>more data points. And that will be interesting as I think that there will more
>>correlation between a rewritten engine and its previous revision, than between
>>entirely different engines. It still won't settle the strict 32 vs 64 question
>>of course, but until somebody comes up with a better idea...
>>
>>-Keith
>
>Note that Fritz has started the "commercial revolution" by converting to
>bitmaps.  I doubt anyone thinks Frans would switch to something that is
>not going to offer better performance...
>
>I'm waiting for some microcoding options in the processors so that we can
>add some chess-specific instructions.  I did this to an older VAX years ago
>and had interesting results.  A hardware popcnt would be nice, for example.
>
>:)
>
>And then the 64 vs 128 wars start.  Because I can see advantages to 128 bit
>words containing two bitboards that can be updated with one operation.  For
>yet another performance gain...

Then the Xiangqi programmers will start arguing ;-)

I would personally place my bets in the reconfigurable computing camp before I
would get excited about microcode. It would be cool to see a little ARM7 with a
chess coprocessor smoke an Athlon. And extra bonus points if the chess
coprocessor were contained in a gameboy or Palm cartridge.

-Keith



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.