Author: Robert Hyatt
Date: 07:39:13 05/30/01
Go up one level in this thread
On May 30, 2001 at 09:02:22, Vincent Diepeveen wrote: >On May 29, 2001 at 20:59:22, Robert Hyatt wrote: > >>On May 29, 2001 at 20:17:24, Pham Minh Tri wrote: >> >>>On May 29, 2001 at 10:02:10, Robert Hyatt wrote: >>> >>>>On May 29, 2001 at 03:40:23, Pham Minh Tri wrote: >>>> >>>>>On May 28, 2001 at 01:40:52, Cheok Yan Cheng wrote: >>>>> >>>>>>For a game that used 8x8 board like chess, we can use the bit board >>>>>>representation by using a 64 bit sized variable. However, for the game that used >>>>>>board sized 9x10 like chinese chess, is it possible to use bit board >>>>>>representation that need 90 bit sized variable (where can i get a variable sized >>>>>>90 bits ??) >>>>>> >>>>>>thanks >>>>> >>>>>Chinese chess has not got the "gold number" of 64 like chess, so it is hard to >>>>>use bitboard or some other representations than array. If you insist on using >>>>>bitboard, I suggest that you could use a "mix" representation (like some chess >>>>>programmers did): use bitboard for some pieces (king, pawn, adviser, elephant - >>>>>they are suitable for 64 bit bitboard) and array for the others. However, I do >>>>>not believe bitboard will bring to Chinese chess programmers any benefit or fun. >>>>> >>>>>Pham >>>> >>>> >>>>Remember my earlier comment. Chess 4.x ran on a 60 bit machine, which means >>>>every bitboard operation had to be spread across two words of memory. On the >>>>32-bit PC, the bitboard operations are one way to take advantage of the two >>>>integer pipes on the PC. Other machines have more than two integer pipes that >>>>are very hard to keep busy. Doing 3 XOR operation is not so bad on those >>>>machines... >>> >>>I do not agree totally with you on the reasons of Slate/Atkin victories (earlier >>>post). IMHO, they might win by other factors, not only bitboard. The factors >>>were probably revolutionary ideas accompanying with bitboard. I know that many >>>ideas and their implementations could occur and become clear in bitboard. So >>>the first inventors of bitboard would have advantage of two sources of ideas: >>>one from traditional representations, one from their new structure. >>> >>>I think, if two chess programmers are similar (same knowledge, ideas, technique, >>>etc.), the one programming bitboard has not any advantage compare with the other >>>on computers not 64 bit, even though he has to work harder. >>> >>>Pham >> >> >>I don't want to start a holy-war about bitboards vs mailbox programs. But one >>simple idea. If you take a program that needs 64 bit words, and run it on a >>64 bit architecture, the 'data density' inside the cpu is really useful since >>it is gating around 64 bit values, and you need 64 bit values, so the bandwidth >>inside the CPU is utilized. if you take a 32 bit program, and run it on a 64 >>bit architecture, 1/2 of every register, 1/2 of every internal bus, 1/2 of >>every internal clock cycle, is wasted. Because you are pumping 1/2 the data >>internally that the cpu is capable of. >> >>I (or someone working with me) ran a test a couple of years ago... taking a >>normal gnuchess and a normal crafty, and running both on a PC to get a benchmark >>number for each. Then the same two programs were compiled and run on an older >>alpha, and not a very fast one... gnu ran right at 2x faster, Crafty was almost >>3.5X faster. > >Ugch ugch, you're comparing good ideas that are horrilbe implemented >in gnuchess with assembly optimized code of crafty! > >Not very fair. What on earth are you talking about? When I ran crafty and gnu on the PC, I _did_ use the best versions available, including my .asm replacement code. When I ran them both on the alpha, I had to give up _all_ of my assembly and use pure C. I still got the speedup I indicated, without _any_ assembly on the alpha... Which didn't favor crafty, it actually favored GNU. Had I compared non-asm crafty on both machines, the difference would have been even more pronounced. > >Also gnuchess is considering pieces to be general, in crafty you have >written out code for *every* piece. What does this have to do with anything? I simply wanted to know "How much more does crafty get from a 64 bit machine than a 32-bit program?" This experiment answered that perfectly for the case of GNU vs Crafty. I doubt your speed differential will be any different on the two machines than that of GNU. > >So the compare is impossible. You can design much faster loops and >arrays as gnuchess has, and if one is doing it in assembly it's even >faster. There was no assembly in the alpha version of crafty, so your comment makes utterly no sense. Whether GNU is a "good" 32 bit program or not is also completely irelevant. The question was how much faster would it run on the alpha? The answer was given above. Ditto for crafty, which _is_ designed around a 64 bit word. I'll be happy to comple Diep and compare it to crafty on my alpha if you want. The numbers won't be any different. > >>That is pretty significant. Now whether someone can beat my move generator > >That's comparing a bicycle which is made to show the advantage of the >wheel with a propellor airplane. > >I would rather want to compare the propellor airplane with a Jet engine. > >>with an 0x88 (or something better) is a question. Whether they can beat my >>evaluator using 0x88 is another question. But it is pretty obvious that on a >>PC they had better beat me by a factor of two, or I will catch up on the > >2.2 to be exact. > >>64 bit machines. And then some of the bitboard wizadry comes in handy. such > >Yes i'll beat you on a SUN too, no problem. Anytime you want me to run the test, feel free to send me your code. We have 64 bit sparcs here, I have an alpha. I can probably run the comparison on a 64 bit MIPS box too. I already know the answer, however. A 64 bit program simply runs faster when ported to a 64 bit machine, than a 32 bit program does when ported to the same machine. _every_ time. > >But let's ask you how you plan to do mobility on an alpha, >instead of the rude summation you're doing now! _SAME_ code as right now. > >And how you plan to use everywhere in the evaluation attacktable >information using the slow attack function currently in use >for crafty! First, my attack tables are not slow. They are simple direct memory lookups. I use exactly the same code on any architecture. > >Where in my attacktables i can directly see how many attackers are >at a square with one AND instruction of an array within L1 data cache >and a constant value: > (MyAtt[sq]&0x0ff) > >I'm using that loads of times in my evaluation. Also whether some >square is attacked anyway by my opponent: > > OpAtt[sq] Vincent, I know you have a short attention span. But please stick to the topic. The issue here is "how does a 32 bit program perform on a 32 bit processor, and then when moved to a 64 bit processor, compared to how does a 64 bit program perform on a 32 bit processor and then on a 64 bit processor?" You want to change the question. You can't. I will happily run my program and your program on both a 32 bit machine and on a 64 bit machine and compare the speedups. Mine will be larger. Every time. > >That would be pretty interesting to get fast in bitboards too, but >i can already tell you, IT'S IMPOSSIBLE to do it quick! This from the same person that said "using message passing is impossible to get a speedup." Or "It is impossible to win at chess with a quiescence search that only does limited captures." Or any of other several "impossible" things. Later, you learn how to do them of course. > >Where bitboards are good in are a few things which hardly get used, >like some complex pawn structures: > if( (MyPawn&0x0a00000010001000) == 0x0a00000010001000 ) > >So you can detect for a certain side at the same time several pawns, >which otherwise is slower: > if( quickboard[sq_a2] == whitepawn > && quickboard[sq_b2] == whitepawn > && quickboard[sq_c2] == whitepawn ) > >However how many programs are there except mine that use loads >of complex pawnstructure in evaluation? Mine, for example. I even compute all the squares all pawns can move to, and use that in the evaluation. > >Now you'll say that you can do other things quick too, like detecting >whether a file is empty and such things. However there are good >alternatives in 32 bits that can be seen as a bitboard too, which >i happen to use in DIEP. There are other "good" alternatives. But you are _still_ using 1/2 of an alpha. _that_ is the point. > >So when in 2010 everyone can buy his own 64 bits machine, >then what i might do is i get a 64 bits machine and >add within 1 week 2 bitboards of 64 bits for pawns to DIEP! > >So i'll do the conversion at the time i have such a machine! > >To be a factor of 3 slower now on 99% of all computers on the world >(32 bits cpu's) that's not my favourite thing to do! I'm not a factor of 3 slower. That is your imagination working. Somethings are faster in bitmaps, some slower. You want to harp on the move generation. That is less than 10% of my execution time, so it doesn't count. Evaluation is very good in bitmaps. > >Right now i do manage with some 32 bits equivalents... > >The only 64 bits values i now have in diep are getting used for >node counts... :) > >>as very simple tests to answer questions like "here is a bitmap of my passed >>pawns, do I have an outside passer on one side, or do I have an outside passer >>on both sides of the board?" Ditto for "here is a bitmap of my candidate >>passers, ..." Right now, on the PC, those operations are pretty expensive > >As i mentionned this can be done very fast with good alternatives of >32 bits which are in fact FASTER as 64 bits alternatives. > >That is, at 32 bits machines! > >>and dig into the advantage of bitmap evaluations. But on the 64 bit machines, >>those operations lose _all_ of their penalties, and begin to look pretty good. > >Diep suffered bigtime when i converted all my 8 bits stuff to 32 bits, >because it all occupied more space > - more code size > - more data size > >Diep became about 5% slower, and i need to mention that i didn't optimal >profit from 8 bits datastructures. The real penalty on P3 processors would >be around 10%. At K7 about 50%. > >So for 32 bits to 64 bits *everything* that's getting 64 bits which first >was 32 bits occupies more size. So i start losing like 10% to *start* with... > >Unless there are very GOOD reasons to get to 64 bits in the future i'll stick >to 32 bits for quite a long time with at least 99% of the code! > >As 99% of all applications will be 32 bits hell sure processors will >take care they are very fast with 32 bits code, so no problems there either! > >there are quite some problems to convert windows applications to 64 bits >because 'int' in general is seen as a 32 bits thing! > >>I wouldn't suggest that anyone rewrite their working code. But I made the >>decision to do this several years ago. I haven't found it a disadvantage on >>32 bit machines, and I really doubt it will be a disadvantage on 64 bit >>machines. > >factor 2.5 at 32 bits machines for sure. > >>you have to think "data density". Which is what made our move generator and >>some evaluations on the Cray so very fast. But not until you start "thinking >>outside the box" and asking "how can this architecture help me do things more >>efficiently?" rather than "How can I make my program run on this architecture?" > >>Those are two vastly different questions. With vastly different answers. > >Here we all agree, but for me putting things in general in a bitboard >means i have 1 bit worth of information about something. That's too little >information for me! > >So if i ever go use 64 bits then it's for those parts of my program where >i can profit. > >What i still do not understand is why crafty is so slow on 64 bits sun >machine. Everything is 64 bits there. For me a sun processor performs >the same as a PII processor at the same speed. > >How is crafty performing at it? > >IMHO the only reason why crafty isn't so slow on intel processors is >because you have way less branches as i have in DIEP. > >I am using complex patterns in DIEP's evaluation function, so everywhere >there are branches. In crafty the branches that are there are easier >to predict. > >So at 64 bits processors where the branch misprediction penalty isn't >big, there crafty will perform very bad compared to DIEP. > >At an alpha the branch misprediction penalty is HUGE. > >So at a 21164 processor which had only 8kb L1 cache i did very bad with >DIEP. At a 4 processor 21264 processor DIEP probably is doing worse as on >a dual K7 1.5Ghz palomino which gets released next month hopefully. >Operating at 1.5Ghz. > >Now the branch misprediction penalty of the K7 isn't very good. But >the 21264 it's way worse. What 21264 has to prevent mispredictions >a bit is a clever system of 2 killertables that reference each other. > >However the size of those tables is so small compared to the number of >branches in DIEP, that it won't speed me up a single %! > >A 633Mhz 21164 processor performed for DIEP as a 380Mhz PII. > >Now that was some time ago of course. Nowadays DIEP would do worse on >that 21164 when compared to the PII, because code+data size has become >bigger. > >A similar thing is what i have in mind for the 21264. Where on paper >it can do very good because it does 4 instructions a clock, it's very >likely that a K7 of same Mhz is completely outgunning it for me. > >For crafty the 21264 is not so bad as it is for me, >because you relatively suffer less >from the HUGE branch misprediction >penalty as i do with DIEP! > >It's not that the 64 bits datastructure is so good at 64 bits, the >problem is that the 21264 design has a serious problem with branches! > >Best regards, >Vincent > > > >Best regards, >Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.