Author: Eugene Nalimov
Date: 11:14:01 06/22/00
Go up one level in this thread
On June 22, 2000 at 14:06:52, Robert Hyatt wrote:
>On June 22, 2000 at 13:24:46, Tom Kerrigan wrote:
>
>>On June 22, 2000 at 11:28:39, Eugene Nalimov wrote:
>>
>>>On June 22, 2000 at 11:10:25, Ed Schröder wrote:
>>>
>>>>On June 22, 2000 at 08:41:18, Robert Hyatt wrote:
>>>>
>>>>>On June 22, 2000 at 01:06:43, Ed Schröder wrote:
>>>>>
>>>>>>On June 21, 2000 at 21:51:01, Robert Hyatt wrote:
>>>>>>
>>>>>>>On June 21, 2000 at 19:14:24, Tom Kerrigan wrote:
>>>>>>>
>>>>>>>>On June 21, 2000 at 17:16:05, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>On June 21, 2000 at 14:52:34, Tom Kerrigan wrote:
>>>>>>>>>
>>>>>>>>>>On June 21, 2000 at 14:38:23, Christophe Theron wrote:
>>>>>>>>>>
>>>>>>>>>>>On June 21, 2000 at 13:33:14, Tom Kerrigan wrote:
>>>>>>>>>>>
>>>>>>>>>>>>On June 20, 2000 at 21:39:01, Robert Hyatt wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>On June 20, 2000 at 15:52:53, Tom Kerrigan wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>On June 20, 2000 at 15:03:48, Robert Hyatt wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>On June 20, 2000 at 14:07:39, Tom Kerrigan wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>On June 19, 2000 at 21:32:56, Robert Hyatt wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>On June 19, 2000 at 20:50:11, John Coffey wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>On June 19, 2000 at 19:48:36, Larry Griffiths wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>I have found bitboards to be an even trade-off on my Pentium system. I have to
>>>>>>>>>>>>>>>>>>>update about 6 bitboards when a piece moves and this generates a lot of
>>>>>>>>>>>>>>>>>>>instructions. I get it back in my IsKingInCheck code so it evens out. I like
>>>>>>>>>>>>>>>>>>>to have fast move generation code, but most of my gains have been through
>>>>>>>>>>>>>>>>>>>alpha-beta, hash-table, killer-move and movelist ordering etc.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>Larry.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>Maybe I am too much of a novice, but I don't see yet why I should convert over
>>>>>>>>>>>>>>>>>>to bitboards. Is move generation faster? If so, why? My program scans the
>>>>>>>>>>>>>>>>>>board and uses simple loops to generate moves. Do you not have to do loops
>>>>>>>>>>>>>>>>>>with bitboards?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>Not to generate moves, No. You generate all the sliding piece moves with two
>>>>>>>>>>>>>>>>>table lookups...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Hmmm. I do table lookups all over my program, and none of them seem to be
>>>>>>>>>>>>>>>>generating any moves...
>>>>>>>>>>>>>>>>The fact is that you DO need to loop to generate moves in a bitboard program.
>>>>>>>>>>>>>>>>Maybe it's not the same loop, but it's still a loop.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>-Tom
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Who says so? Ask the Dark Thought guys.
>>>>>>>>>>>>>>>Or Slate/Atkin. You only need to loop if you want to take the attack bitmap
>>>>>>>>>>>>>>>and turn it into a list of moves. This is not the way _all_ programs operate
>>>>>>>>>>>>>>>(chess 4.x, Dark Thought, others, any of which generate a few moves at a time,
>>>>>>>>>>>>>>>then take one and search it, without enumerating the other moves.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>So loops are something you do (with bitmaps) if you want to, not because you
>>>>>>>>>>>>>>>have to.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>As far as your table lookups not generating any moves, that is a programming
>>>>>>>>>>>>>>>issue. Mine do. :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Maybe your makemove() function can take bitboards as input (i.e., here is a set
>>>>>>>>>>>>>>of squares that my pieces can move to) but mine sure can't.
>>>>>>>>>>>>>>-Tom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>You are missing the point. A move generator _can_ emit a single move, which
>>>>>>>>>>>>>can be fed into MakeMove(). Read "Chess Skill in Man and Machine", the chess
>>>>>>>>>>>>>4.x section. They explain this pretty well. It takes zero loops to emit a
>>>>>>>>>>>>>single chess move. You pick the source square. You do two table lookups for
>>>>>>>>>>>>>bishops (say) and you have all the target squares it can move to. A single
>>>>>>>>>>>>>FirstOne() and you have a <to> square, which is all you need to make the move,
>>>>>>>>>>>>>and recursively call Search().
>>>>>>>>>>>>
>>>>>>>>>>>>So you end up having to call gen() a mess of times. I don't see how that isn't a
>>>>>>>>>>>>loop.
>>>>>>>>>>>>-Tom
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>As I understand he says that in order to generate one move he doesn't have to
>>>>>>>>>>>loop. That's what James explains in another post.
>>>>>>>>>>>
>>>>>>>>>>>With 0x88 or 16x you have to loop thru empty squares, he says with bitboards you
>>>>>>>>>>>don't have to. For each rank, file or diagonal in any configuration (by
>>>>>>>>>>>configuration I mean set of empty squares in this rank/file/diagonal), you can
>>>>>>>>>>>have precomputed arrays that instantly give you the set of squares (a bitboard)
>>>>>>>>>>>a sliding piece can move to.
>>>>>>>>>>>
>>>>>>>>>>>Not that I support his point of view about bitboards. I prefer to "loop thru
>>>>>>>>>>>empty squares" in my L1 cache rather than clobbering the same cache with
>>>>>>>>>>>bitboards. And anyway, the time required to extract the rank/file/diagonal from
>>>>>>>>>>>the "occupied" bitboard and the time required to process the resulting set of
>>>>>>>>>>>"can move to" squares is not required in 0x88 or 16x. And for non-sliding pieces
>>>>>>>>>>>(which represent in average half of the pieces present on the board), the method
>>>>>>>>>>>does not apply.
>>>>>>>>>>
>>>>>>>>>>Exactly, it's necessary to process the resulting bitboards. Maybe you can do
>>>>>>>>>>some simple operations to get an interesting set of bits, but at some point, you
>>>>>>>>>>have to turn those bits into something useful. There has to be a loop somewhere
>>>>>>>>>>which extracts the bits and does appropriate things to them. If you're lucky,
>>>>>>>>>>your processor has BSF/BSR (or an equivalent) and this loop is relatively fast.
>>>>>>>>>>But if you don't have these instructions, I bet the pretty bit patterns aren't
>>>>>>>>>>helping you much. Personally, I'm a little sick of people saying, "oh, one AND
>>>>>>>>>>operation and I'm done!" and totally ignoring everything else that has to be
>>>>>>>>>>done.
>>>>>>>>>>
>>>>>>>>>>-Tom
>>>>>>>>>
>>>>>>>>>You are not nearly so sick of hearing that as I am sick of hearing people talk
>>>>>>>>>about what you can and can't do with bitboards _without_ ever having tried them.
>>>>>>>>>
>>>>>>>>>Again, there is _no_ need for a loop. I can generate a single move (capture)
>>>>>>>>>with no loop of any kind. Anybody can generate a non-capture (single move)
>>>>>>>>>without a loop, of course. But captures are way more common to want, since
>>>>>>>>>they are usually tried first.
>>>>>>>>
>>>>>>>>Fine, let's review something you said earlier:
>>>>>>>>>Not to generate moves, No. You generate all the sliding piece moves with two
>>>>>>>>>table lookups...
>>>>>>>>
>>>>>>>>So how about you tell me how you're going to generate multiple moves ("all the
>>>>>>>>sliding piece moves") in some sort of machine-usable form without doing a loop?
>>>>>>>>Remember, a loop around the move generator is still a loop. Nobody's asking
>>>>>>>>whether or not you can generate a single stupid capture without a loop, and
>>>>>>>>there's no practical value in that anyway, unless you can be sure that the
>>>>>>>>capture is generated in the correct order.
>>>>>>>
>>>>>>>
>>>>>>>Easy, again from Chess Skill in man and machine. I produce a 64 bit value
>>>>>>>for all the bishop moves, by doing two table lookups. I already know the
>>>>>>><from> square to produce these moves. I use a FirstOne() call to find one
>>>>>>>of the destination squares (<to> square). I clear this bit, save the 64 bit
>>>>>>>vector, and make this move. I recursively call search. When it returns, I
>>>>>>>regrab the 64 bit vector, FirstOne() to find the next destination, make this
>>>>>>>move and again call Search() recursively. The only loop I have is the same
>>>>>>>loop everyone has to select the next move. I have _zero_ loops to _generate_
>>>>>>>the moves.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>I've done 0x88, 8X8, 16x16, 10x12, and probably others. I don't think that
>>>>>>>>>move generation is the separating point for bitboards vs the others, except
>>>>>>>>>for the fact that I can generate captures far easier. Bitboards help in other
>>>>>>>>>places as well. And on 64 bit architectures, they make a lot of sense.
>>>>>>>>>
>>>>>>>>>You ought to do what I did 5 years ago. Say "I am going to try this for a
>>>>>>>>>couple of years, to see if this is worthwhile." It takes a lot of time and
>>>>>>>>
>>>>>>>>Remember, I did use bitboards for a while, I know many of the issues involved.
>>>>>>>
>>>>>>>
>>>>>>>I have used them for 5 years. I have learned far more. And I am still finding
>>>>>>>new ways to do things every few months. The learning doesn't stop after using
>>>>>>>them "for a while".
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>experience and false starts to do bitboards. But they _can_ work quite well.
>>>>>>>>>I can point to several programs that prove this, from Kaissa and chess 4.x in
>>>>>>>>>the 1970's, thru Crafty and several others in 2000.
>>>>>>>>>
>>>>>>>>>But they _do_ take time to learn, just like a programming language does.
>>>>>>>>
>>>>>>>>You have your way, I have my way. In case you didn't notice, I'm not saying one
>>>>>>>>way is better or worse. (Not in this thread, anyway.) So I don't see why you're
>>>>>>>>being so violently pro-bitboard. All I'm saying is that you should not jump in
>>>>>>>>and say that you can solve the world's problems with a single table lookup,
>>>>>>>>because that's simply not accurate.
>>>>>>>>
>>>>>>>>-Tom
>>>>>>>
>>>>>>>
>>>>>>>I'm not violently pro-bitboard at all. I simply corrected some _wrong_
>>>>>>>information that was being posted here. I've said hundreds of times that
>>>>>>>until we talk about 64 bit cpus, bitboards probably do no better than break
>>>>>>>even with other good approaches. I have also said, hundreds of times, that
>>>>>>>move generation is _not_ the most important thing done in a chess board. At
>>>>>>>least for my code, it is not in the top 5 when you profile things.
>>>>>>>
>>>>>>>As far as a lookup goes, I can generate _all_ sliding piece moves for a bishop,
>>>>>>>with two 64 bit memory loads. I can generate all captures for a bishop just as
>>>>>>>easily, without having to traverse the empty squares. I didn't say any more
>>>>>>>or any less than that. Since in a chess engine, generating captures is a very
>>>>>>>common thing to do, bitboards are good there. They are good in other places.
>>>>>>>They also have their problems. But memory bandwidth is not particularly one of
>>>>>>>them. And on machines like the EV6-based 64 bit architectures, I think bitmaps
>>>>>>>might have a real advantage due to the inherent data density they have.
>>>>>>
>>>>>>Having never had the pleasure of dealing with BB's I can understand the
>>>>>>2 x AND to detect the capture. But how do you get the square from the
>>>>>>64 bit integer since you can't address a 64 bit table to get the square
>>>>>>immediately so you have to write a piece of clever code for it, right?
>>>>>>Looks (very) time consuming to me, or?
>>>>>>
>>>>>>Ed
>>>>>
>>>>>
>>>>>Intel has BSF/BSR which finds the number of the first one bit, from either end
>>>>>of the 32 bit value you want. For 64 bits you load the left-end, and if non-
>>>>>zero, BSF it. If zero, you load the right-end and if not zero, you BSF it.
>>>>>
>>>>>only a few instructions. Here is the code for X86:
>>>>
>>>>I was already afraid of such an answer as the principle to get a square
>>>>takes many cycles and if there are more bits set you have to loop anyway
>>>>to get the next square(s). Why this should be superior the simple approach
>>>>of adding the direction add checking the board passes me.
>>>
>>>You have to use 4 loops for 4 directions (or one loop with extra if inside, that
>>>doesn't change anything). Typically you have (say) half of mispredicted branch
>>>per such simple loop, so on average you'll have 2 of them. Mispredicted branches
>>>are evil -- 9 clock cycles on P6/PII/PIII, 20 on Willamette. And that penalty
>>>you have to pay regardless of the possibility of captures.
>>>
>>>With bitboards there would be *no* mispredicted branches as long as there are 0
>>>or 1 possible captures and no more than 2 identical pieces. Branch predictor in
>>>modern CPU is capable of recognizing such regular patterns.
>>>
>>>Eugene
>>
>>It seems that Bob's code for using BSF on a 64-bit value has a branch that would
>>be mispredicted *frequently*.
>>-Tom
>
>
>Did you see eugene's code that used a conditional move? NO branches of any
>kind. We don't use this in Crafty because it would then not work on AMD or
>any Intel cpu prior to the Pentium Pro.
>
>With cmov there are _no_ branches.
See my reply to Tom. The Linux version of code contains only "branch if argument
is zero", that can be safely predicted as not taken. Otherwise it's branch-free
without any conditional moves.
Eugene
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.