Author: Vincent Diepeveen
Date: 08:34:37 06/10/98
Go up one level in this thread
On June 08, 1998 at 22:50:16, Robert Hyatt wrote: [about generating speed] I had something left to post. After we talked at ICC, i got some questions so i'll put the comparision here too. At my Pro200 crafty is indeed faster, but that has a different reason: i'm running a crafty version which is compiled with microsoft visual c++, just like my program is, and at your pro you compile under linux, so you may be glad that there is a compiler for that platform called GCC. So i'll use that msvc++ version to compare with. I took 2 positions. First i took openingsposition, as we start the game always over there, then i secondly also took BK22 as you liked to see. Same test also done at a PII-266 EDORAM by Bradley Woodward, to see how fast a PII is relatively (intel says PII processor is in fact a pentium pro processor so we should see the same speed differences) to the programs. Openingsposition Pentium Pro EDO 60ns: Crafty : 2,103,049 Diep : 3,081,664 46.5% faster BK22 Pentium Pro EDO 60ns: Crafty : 2,941,176 Diep : 3,699,552 25.8% faster Openingsposition PII-266 EDO Crafty : 2,557,544 Diep : 3,906,250 52.7% faster BK22 PII-266 EDO Crafty : 3,470,031 Diep : 4,874,446 40.5% faster Note that i store a move with 4 ints, where crafty stores it in 1 int. So i'm doing 4 times more and i do that about 50% faster. 266/200 * 3669552 = 4880504 which equals 4874446 almost. 266/200 * 3081664 = 4098613 which is almost 200K slower. So we can say that diep gets faster when Mhz gets faster, where crafty has big problems to become faster when Mhz gets faster. How comes? Lonnie tested the K6-266 SDRAM risc machine to get 3.34M generations a second at openings position, and at 300Mhz (100Mhz bus) it gets 3.75M generations a second. Still not getting near PII speed, but considering its price a very good processor. >one thing I didn't explain that deserves a comment. Probably the >fastest >way I know to generate moves is the "move table" approach, which lets >you >almost reduce a move generator to a memory-to-memory copy. which on a >PC >is not a great thing. IE on the pentium pro/200mhz machine, a memory >read takes about 20 clock cycles (100ns). Ditto for each memory write. >So a memory copy runs at a rate of one word every 200ns, or about 5 >million words per second, max. >When we take programs like darkthought or crafty (or other bitmap >programs) to a 64 bit architecture, that has a FirstOne() or LeadZ() >type instruction, then the bitmap generators get a big boost.. because >we only have to do 1/2 the memory traffic (stores only) since the moves >come from the 64bit register rather than memory. And while the CPU is >waiting on the slow memory write, it can cycle back thru the loop, and >be ready for the next store as soon as the current one finishes >(actually, >good architectures can handle a backlog of delayed writes, but they >eventually fill the delayed write queue and stall. But the point is, >a bitmap program will be able to drive memory at 10M words per second >(assuming P6/200 memory speeds) which is 10M moves per second, rather >than the read/write copy scheme that will be only 1/2 the speed. > >Once we get the firstone in hardware so we don't have to waddle off to >a procedure for every move... Note that firstone is already in the hardware instruction at 386. Called BSF and BSR. Speed of BSF/R at pentium pro: 2/3 micro-op and non pairable, where normal instructions like MOV are 1 micro-op and pairable. But supposing that this cache-to-register-copy gets as fast as you want that firstone to be, then this cache approach always beats you, and the facts are there. Please don't tell stories about why i could not get my generation table out of cache but need to get it out of slow memory, caches get bigger and i have done EFFORT to get my tables within L2 cache, and in the future that'll be in L1 cache too, where in the future that firstone of your still is that slow at intel (and suppose BSF and BSR get faster then you still need the same slow loop to get the squares out AFTER you have way more overhead than i did, where i get the squares directly). Problems of bitboards are not the BSF and BSR instructions. problem is that you still need that loop to get the squares out and that you need to program for EVERY piece, where i with 1 general purpose routine can generate all pieces. If i would program for generation speed and store it in 1 int and program everything for every piece then i'm of course way more faster than the current speed diff. The second problem of bitboards is that you can do only a limited number of things with it. Suppose i have a table with mobility values for it for every square, where values are generally different, as controlling a1 is not as important like controlling d4 is. Quite logical. Can you show some sample code how you count this with bitboards? That's pure horror in my opinion. Further i use in my evaluation attacktables which consists out of a number of things, like what piece attacks it and the number of attackers. The number of attackers is something i use a lot in my eval. I have this number in an array int A[64], which definitely is in the cache. So i lookup this with A[square]. Now i think this is also a major disadvantage of bitboards. To look it up with bitboards you need to add all kinds of different things with masses of 'if'. I'm not asking to show how you do this in bitboards, that would give you nightmares. If you call a function then show the audience how much things that function does, JUST TO CALCULATE IT FOR 1 SQUARE, where i have all 64 ready. Where you are wating till you get an alpha for free, i'm waiting for a processor where branch prediction doesn't take that much systemtime when it mispredicts. I have so many compares in my eval which takes 90% of systemtime, that i'm totally depending on the speed of the processor after misprediction. That would speed up my program several times, where bitboards would slow it down another 4 times or so, and take 2 times more code too, which causes again some extra delay. Greetings, Vincent
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.