Author: Robert Henry Durrett
Date: 15:55:50 06/21/02
Go up one level in this thread
On June 21, 2002 at 17:09:33, Robert Hyatt wrote: >On June 20, 2002 at 13:29:36, Keith Evans wrote: > >>On June 20, 2002 at 13:03:10, Robert Hyatt wrote: >> >>>On June 20, 2002 at 12:30:47, Keith Evans wrote: >>> >>>>On June 19, 2002 at 23:27:51, Robert Hyatt wrote: >>>> >>>>>On June 19, 2002 at 20:45:33, Keith Evans wrote: >>>>> >>>>>>On June 19, 2002 at 14:33:56, Tom Kerrigan wrote: >>>>>> >>>>>>>On June 19, 2002 at 13:10:42, Robert Hyatt wrote: >>>>>>> >>>>>>>>I don't care about the 32 bit specint. I care about the fact that a >>>>>>>>1.4ghz pentium runs Crafty at about 750K nodes per second. The 600mhz >>>>>>>>21264 ran it at over 800K. And 600mhz is _not_ the fastest 21264 around. >>>>>>>> >>>>>>>>The 1ghz mckinley runs it twice as fast as that 1.4ghz pentium, 1.5M nodes >>>>>>>>per second. _that_ is definitely "something to get excited about" IMHO.. >>>>>>> >>>>>>>So you like the 21264 and the McKinley. That's great. Maybe you can start a fan >>>>>>>club, instead of posting to a thread where people are trying to have an >>>>>>>intelligent conversation about 64-bit computing. >>>>>>> >>>>>>>-Tom >>>>>> >>>>>>Is there an easy way to compare a 1.4 GHz P3 to a 1 GHz McKinley and see where >>>>>>this Crafty performance increase is coming from? I'm not at all familiar with >>>>>>McKinley, but would it be possible to run a version of Crafty compiled for >>>>>>32-bits on a McKinley and compare that to a Crafty compiled for 64-bits on >>>>>>McKinley? Is this a dumb idea? If this isn't possible, then it's going to be >>>>>>difficult to tell where the performance gain is really coming from. >>>>>> >>>>>>-Keith >>>>> >>>>>I don't know that you could do this. It would require that the compiler know >>>>>how to implement 64 bit ints as 2x32 bits, which on a mckinley would be a waste >>>>>of the compiler-writer's time... >>>> >>>>So what's wrong with Tom's suggestion - "You can make a bitboard class that >>>>contains two 32-bit ints and overload all the int operators and run it on a >>>>64-bit chip. ... renaming his source files from .c to .cpp and writing this >>>>simple class" >>>> >>>>Is this is a valid experiment then why not do it and settle the argument? >>>>Someone might even consider publishing a paper on it. (Maybe Tom would volunteer >>>>to hack the code if you don't want to bother?) If it's not valid then what's >>>>wrong with it? Do you think that the compiler would outsmart Tom and use 64-bit >>>>words for the bitboards anyways? >>>> >>>>What would your prediction for such an experiment be? That the version with 2 x >>>>32-bit bitboards would run half as fast as the version with 64-bit bitboards? >>>> >>>>I'm pretty sure that we could find some willing volunteers to run some simple >>>>experiments on their hardware. >>>> >>>>-Keith >>> >>> >>>It could certainly be done. However, I don't see what it would prove. >>>Other than that 64 bit operations are more efficient when done in one >>>"chunk" than in two. That seems intuitive anyway. It would also present >>>a few problems, with the FirstOne() and LastOne() PopCnt() functions that >>>use assembly on the PC but not on the 64 bit machines (yet). >>> >>>Remember that my comparison results are for the "best" 32 bit crafty I have >>>vs the 64 bit machines. Unfortunately, the 64 bit machines have no assembly >>>so they are at a significant disadvantage to start with, yet they are >>>blazingly fast. When I have access to one for a period of time, that >>>advantage will go away completely. Then it will be 32 vs 64 in a _real_ >>>comparison, not a biased-toward-32-bit mode as it is done today. >> >>I think the issue is that there may be other things (besides the word length) >>changed on the 64-bit processor which will also improve performance. I hope that >>at the very least you would also run other chess programs (non-bitboard) on the >>64-bit processor and see what type of speed increase that they get versus older >>32-bit processors. >> >>I think that the point of the experiment that we proposed "what it would prove" >>is what portion of the Crafty speed-up is due to the bitboard structures, and >>what portion is due to other factors. >> >>-Keith > > >The idea is good. But the implementation is flawed. We need a compiler that >can produce identical code, except that bitboard ops are done by either 64 >bit hardware or by 32 bit operations. But all else stays the same. Perhaps that is not fair. [Overly restrictive.] It is not at all clear to me that compilers made for "the world of 64-bit," and all that implies, will be producing IDENTICAL code [except for word length]. I have trouble with the idea that 64-bit will be "32-bit business as usual." Recall what happened when everybody went from 16-bit to 32-bit. Do all modern 32-bit compilers put out code identical [except for word length] to that put out by the older 16-bit compilers? I expect that people will figure out how to do new things in "64-bit" which were either impossible or impractical in "32-bit." The compiler designers will do that too. >Unfortunately, compilers don't behave like that. Most likely the 32 bit >operations will be poorly optimized, or even flawed. Because what compiler >writer would _really_ want to do both well when the machine is really 64 bits? > >I've done my share of compiler work, and the optimizer is a _lot_ of work. I >spent the time on the parts that made "native" code work the fastest. I didn't >care about oddball cases. > >When an experiment has no useful output, there is little point in doing it. >We assume the 64 bit stuff is done efficiently. If the processor has lots >of pipes, it might be that doing things in 32 bit chunks is just as fast, >if the pipes are mostly idle in the 64 bit program... if the 32 bit code is >ugly, the 64 bit program might be way faster. Or barely faster. If the 32 bit >code is very good, the 64 bit program might be much faster. Or a little faster. > >Too many variables. No way to conclude much of anything. > >I tend to rely on logic in such cases. Whenever I can replace two (or >more in the case of shifts or adds/subtracts) instructions by one, I tend >to do that on general principles. Obviously 64 bit hardware can't be slower >on 64 bit programs than on 32 bit programs. Obviously, when you think about >the machine instructions to do things like AND/OR/XOR/SHIFT/etc you produce >fewer instructions for 64 bit machines than you do for 32 bit machines. Fewer >instructions -> fewer memory references, 2x the thruput for those 64 bit ops >vs 32 bits (>2x for some like shift). > >It seems obvious to me that 64 bit machines are better for 64 bit programs than >they are for 32 bit programs. How much obviously depends on how the 64 bit >program depends on 64 bit operations. IE I can't imagine that Crafty does >less than 25% 64 bit operations. That represents a significant performance >boost. And as time goes on, more 64 bit operations will be added since the >speed penalty will be less.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.