Author: Robert Hyatt
Date: 20:38:31 02/29/04
Go up one level in this thread
On February 29, 2004 at 19:39:38, Vincent Diepeveen wrote: >On February 29, 2004 at 16:04:53, Gerd Isenberg wrote: > >>On February 29, 2004 at 15:13:49, David Mitchell wrote: >> >>>On February 29, 2004 at 14:44:54, Martin Schreiber wrote: >>> >>>>Hi, >>>> >>>>I've two questions: >>>> >>>>1.) >>>>is using bitboards a necessary condition to write a strong chess engine? And if >>>>not so, what other good/fast solution we have for the board representation? >>>> >>>>2.) >>>>And are there strong freeware or commercial chess engines, which don't use >>>>bitboards? >>>>And what kind of board representation they use? >>>> >>>>Thanks for your comments >>>>Martin >>> >>>1. No, bitboards are not necessary in order to write a strong chess engine. >>>2. I would guess 0x88 is as fast as bitboards for 64 bit cpu's, and slightly >>>faster than bitboards on 32 bit cpu's. Hard to make a direct comparison because >>>with bitboards, you get more info your program can use later in the eval, etc. >>> >>>If you click on Computer Resource Center -> Chess links, and select Crafty, you >>>can find and d/l an excellent write up by Robert Hyatt on this subject. >>> >>>Bitboards take a while to learn to use well. Many commercial programs have not >>>used them in the past, but may in the future if the 64 bit cpu's become quite >>>popular, because of the 2x (at least) speed up bitboards achieve on them. >> >>Not per se with AMD64 or intel64. >> >>64-bit instructions do have an additional prefix byte. >>So the codesize advantage may only 3/4 instead of 1/2. >> >>Latency of 64 bit instructions is sometimes worse (bsf, mul). >> >>Two independent 32-bit instructions are likely to gain more parallelism. >> >>It doesn't matter much, whether 1*64 or 2*32 bit are loaded/strored, considering >>some latency and internal bus widths. >> >>More important features with AMD64, and that is not only helpful for bitboards, >>are the doubled register-file size, the bigger 2.level cache, improved branch >>prediction, two more pipe stages and more. >> >>OTOH register hungry bitboard algorithms which are not efficiently possible with >>x86-32 became more interesting now. >> >>Gerd > >The best way to see the relativeness of this all is seeing the speed win of >crafty at specint when moving from 32 bits to 64 bits. 1Ghz alpha 21264 which >can retire 4 instructions a cycle (8 issue wide), with huge level caches >(especially huge L1) and great branch prediction, was the same speed for crafty >like a 1.33Ghz K7. So your 1.33ghz K7 can run about 1.5 M nodes per second with Crafty? I don't think so. A 600mhz 21264 was over 900K. 1ghz is around 1.5M... > >So we know for sure the speedwin was somewhere smaller than a few % from moving >32 bits to 64. "smaller than a few %"??? :) > >The speedwin from crafty when going from K7 to Itanium2, was real real small. I have absolutely no idea, again, what you are talking about. Eugene's numbers have produced an Itanium result that is not that far behind the 1ghz opteron numbers... Your K7 is in that speed range? I don't think so... > >The speedwin from DIEP when going from K7 to itanium2 was *huge*. > >As proven by Johan de Gelas, a big L3 cache is not the reason. It just hardly >helps single cpu nor dual for DIEP (see aceshardware.com P4C versus P4EE). > >The opteron however has a way faster LATENCY for memory. Randomly accessing >memory is way faster. Can we go back to fact? 1cpu latency is between 60 and 70ns. 1cpu access times are between 60-70, to 180-210 if you use < 4 gigs of ram, add another 60-70 if you go bigger. That is maybe 2x faster than the intel boxes of today. It depends greatly on the size of the TLB. But on the opteron, if you stretch memory beyond 4 gigs, you make the map even more complex. > >As we can see from profilers, crafty and many other chessprograms (diep too) are >dependant upon the RAM speed quite a lot. > >The speedwin for crafty from moving K7 to opteron, even in 32 bits, it is >*huge*. > >Additionally, when running parallel, crafty has a very inefficient programming >for its search threads. It needs an extra pointer everywhere. Define "inefficient"? perhaps "not the way Diep does it?" That's not a reasonable definition. The pointer is not expensive. On the Opteron is is even less expensive. > >So the movement from 8 registers to 16 registers is a big win for crafty (way >less for DIEP there). > >Opteron is 50-60% faster a cycle for DIEP than K7, thereby even outgunning the >IPC for itanium2. > >I didn't have the chance yet to optimize with a real good efficient compiler for >opteron (hopefully gcc 3.4 which is just pre-released will do a good job). Nor >did i try the pathscale compiler yet. > >Especially from gcc 3.4 i expect a lot. The win from 8 registers to 16 i expect >not so much from for DIEP like it trivially gives to crafty. the gcc released with Suse 9 (as I used in the last ICCT) uses all 16 registers. > >The move from 32 bits to 64 bits for crafty will be under 5% speedwin though. Absolute and utter hogwash. > >> >> >>> >>>dave
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.