Author: Vincent Diepeveen
Date: 16:39:38 02/29/04
Go up one level in this thread
On February 29, 2004 at 16:04:53, Gerd Isenberg wrote: >On February 29, 2004 at 15:13:49, David Mitchell wrote: > >>On February 29, 2004 at 14:44:54, Martin Schreiber wrote: >> >>>Hi, >>> >>>I've two questions: >>> >>>1.) >>>is using bitboards a necessary condition to write a strong chess engine? And if >>>not so, what other good/fast solution we have for the board representation? >>> >>>2.) >>>And are there strong freeware or commercial chess engines, which don't use >>>bitboards? >>>And what kind of board representation they use? >>> >>>Thanks for your comments >>>Martin >> >>1. No, bitboards are not necessary in order to write a strong chess engine. >>2. I would guess 0x88 is as fast as bitboards for 64 bit cpu's, and slightly >>faster than bitboards on 32 bit cpu's. Hard to make a direct comparison because >>with bitboards, you get more info your program can use later in the eval, etc. >> >>If you click on Computer Resource Center -> Chess links, and select Crafty, you >>can find and d/l an excellent write up by Robert Hyatt on this subject. >> >>Bitboards take a while to learn to use well. Many commercial programs have not >>used them in the past, but may in the future if the 64 bit cpu's become quite >>popular, because of the 2x (at least) speed up bitboards achieve on them. > >Not per se with AMD64 or intel64. > >64-bit instructions do have an additional prefix byte. >So the codesize advantage may only 3/4 instead of 1/2. > >Latency of 64 bit instructions is sometimes worse (bsf, mul). > >Two independent 32-bit instructions are likely to gain more parallelism. > >It doesn't matter much, whether 1*64 or 2*32 bit are loaded/strored, considering >some latency and internal bus widths. > >More important features with AMD64, and that is not only helpful for bitboards, >are the doubled register-file size, the bigger 2.level cache, improved branch >prediction, two more pipe stages and more. > >OTOH register hungry bitboard algorithms which are not efficiently possible with >x86-32 became more interesting now. > >Gerd The best way to see the relativeness of this all is seeing the speed win of crafty at specint when moving from 32 bits to 64 bits. 1Ghz alpha 21264 which can retire 4 instructions a cycle (8 issue wide), with huge level caches (especially huge L1) and great branch prediction, was the same speed for crafty like a 1.33Ghz K7. So we know for sure the speedwin was somewhere smaller than a few % from moving 32 bits to 64. The speedwin from crafty when going from K7 to Itanium2, was real real small. The speedwin from DIEP when going from K7 to itanium2 was *huge*. As proven by Johan de Gelas, a big L3 cache is not the reason. It just hardly helps single cpu nor dual for DIEP (see aceshardware.com P4C versus P4EE). The opteron however has a way faster LATENCY for memory. Randomly accessing memory is way faster. As we can see from profilers, crafty and many other chessprograms (diep too) are dependant upon the RAM speed quite a lot. The speedwin for crafty from moving K7 to opteron, even in 32 bits, it is *huge*. Additionally, when running parallel, crafty has a very inefficient programming for its search threads. It needs an extra pointer everywhere. So the movement from 8 registers to 16 registers is a big win for crafty (way less for DIEP there). Opteron is 50-60% faster a cycle for DIEP than K7, thereby even outgunning the IPC for itanium2. I didn't have the chance yet to optimize with a real good efficient compiler for opteron (hopefully gcc 3.4 which is just pre-released will do a good job). Nor did i try the pathscale compiler yet. Especially from gcc 3.4 i expect a lot. The win from 8 registers to 16 i expect not so much from for DIEP like it trivially gives to crafty. The move from 32 bits to 64 bits for crafty will be under 5% speedwin though. > > >> >>dave
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.