Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Bit board representation (more info)

Author: Robert Hyatt
Date: 07:57:20 05/31/01
On May 31, 2001 at 06:06:41, Vincent Diepeveen wrote:

>On May 30, 2001 at 10:49:22, Robert Hyatt wrote:
>
>>On May 30, 2001 at 09:02:22, Vincent Diepeveen wrote:
>>
>>>
>>>>as very simple tests to answer questions like "here is a bitmap of my passed
>>>>pawns, do I have an outside passer on one side, or do I have an outside passer
>>>>on both sides of the board?"  Ditto for "here is a bitmap of my candidate
>>>>passers, ..."  Right now, on the PC, those operations are pretty expensive
>>>
>>>As i mentionned this can be done very fast with good alternatives of
>>>32 bits which are in fact FASTER as 64 bits alternatives.
>>>
>>>That is, at 32 bits machines!
>>>
>>>>and dig into the advantage of bitmap evaluations.  But on the 64 bit machines,
>>>>those operations lose _all_ of their penalties, and begin to look pretty good.
>>>
>>>Diep suffered bigtime when i converted all my 8 bits stuff to 32 bits,
>>>because it all occupied more space
>>>  - more code size
>>>  - more data size
>>
>>Do you understand that some machines don't behave like a PC?  IE take an IBM
>>RS6000 workstation and convert every floating point value from 4 bytes to 8
>>byte doubles.  The speed won't change one iota.. because the machine does all
>>fp operations in 8 byte mode _anyway_.
>
>But no average user ever will have a RS6000 workstation.
>
>They will all get a K7 however.

So?  What says that the K8 won't move 8 bytes of data around regardless
of the data type you want to use?  Many machines are already doing that.
Just try to do a one-byte read from memory on a PC.  you get 32.



>
>>
>>
>>
>>>
>>>Diep became about 5% slower, and i need to mention that i didn't optimal
>>>profit from 8 bits datastructures. The real penalty on P3 processors would
>>>be around 10%. At K7 about 50%.
>>>
>>>So for 32 bits to 64 bits *everything* that's getting 64 bits which first
>>>was 32 bits occupies more size. So i start losing like 10% to *start* with...
>>
>>Again, not necessarily.  On the alpha the bus is 4x wider than on the PC.
>
>Bus is not most important for computerchess. It is for many applications,
>hence it's so wide on alpha and on very expensive machines.
>
>But you also need 4x more L1 cache which is a big disaster for computerchess.
>You also need 4x more L2 cache etcetera.

Again, "what are you talking about?"  On the PC, a cache line is 32 bytes.
If you reference _any_ byte in that line, the cache controller loads all 32
bytes from memory.  That takes 4 bus transactions.  On the alpha it takes
exactly one.  The cache needs to be no bigger.  It just fills 4x faster.





>
>>
>>
>>>
>>>Unless there are very GOOD reasons to get to 64 bits in the future i'll stick
>>>to 32 bits for quite a long time with at least 99% of the code!
>>>
>>>As 99% of all applications will be 32 bits hell sure processors will
>>>take care they are very fast with 32 bits code, so no problems there either!
>>>
>>>there are quite some problems to convert windows applications to 64 bits
>>>because 'int' in general is seen as a 32 bits thing!
>>>
>>>>I wouldn't suggest that anyone rewrite their working code.  But I made the
>>>>decision to do this several years ago.  I haven't found it a disadvantage on
>>>>32 bit machines, and I really doubt it will be a disadvantage on 64 bit
>>>>machines.
>>>
>>>factor 2.5 at 32 bits machines for sure.
>>
>>If that is true, why is my nps value in line with everybody elses?  bitboards
>
>You do way less in Qsearch as others do. Crafty is using loads of
>inline assembly. If you would do also checks in qsearch and other
>things in qsearch like most commercials do, then your node rate would
>drop factor 2.5 at least.

Again, and again, "what are you talking about?"  I used to do checks in
q-search.  My NPS was _higher_ as a call to the move generator resulted in
more moves being used.  Now I generate all captures, but don't search most.
That _hurts_ NPS, not helps it.

Also, crafty is not using "loads of inline assembly".  It is using maybe 100
lines.  Would you care to guess how much slower it is with those lines removed?
I'll wait to tell you until you guess.  Hint:  It is _not_ 2x slower without
it.





>
>What you did is pretty amazing actually. You took the bitboard concept
>and did those things with it which are not dead slow with it. If you
>however add things that others do like me, like i generate for my
>pleasure all moves in qsearch *always* and i select those moves which
>i think are interesting to try to improve my score.

Why would I want to blow the tree up like that, when what I do works just
fine, just as it does for Ferret...


>
>What's the diff between a good capture and a good passer move?
>
>>are _not_ inherently 2.5 times slower.  In fact, it is much less as it is easier
>>to keep two int pipes busy.
>
>If crafty remains as it is, your NPS might seem impressive, but compare
>it to goliath getting 2.5M nodes a second at I-CSVN.

And your point would be???  I can run over 2x faster just by going back to a
very primitive evaluation.  I am not interested in doing that.




>
>Now that's assembly too.
>
>Compare it with Fritz getting 1.5M nodes a second at a dual P3,
>now that's assembly too.
>
>And Fritz is not only outsearching you, it's also doing MORE in qsearch
>and doing more extensions also.

Again, so what?  You are confusing what the programmer does to what the
programmer _can_ do.  Anything you do in mailbox, I can do in bitboards.
I don't claim they are faster in everything.  I do _know_ they are not
slower everywhere.  At worst, it is a break-even.  But on a 64 bit machine,
it isn't...




>
>Crafty is always researching the same game space, and you don't hear
>me say that's bad thing to do, but you simply can't compare your node
>counts with that of someone elses, because both positionally
>and tactically at n depth you see way less as i see at n.

At time 3 minutes per move, I see more than you see at 3 minutes per move,
however.  And _that_ is the key issue.  Not depth.  _time_.




>
>Now Crafty has a few things in eval, like a bit of bad/good bishop code
>which fritz seemingly has not, so you have an advantage there.
>
>That is in practical game play interesting.
>
>Tiger is getting such a low NPS because it's only busy with pruning.
>If Tiger wouldn't prune it would get loads of nps more. Christophe
>might be able to explain...
>
>Idem for Shredder. Shredder is bigtime forward pruning. If he doesn't
>forward prune then his NPS would get skyhigh.
>
>In crafty you always nullmove, which is a good thing, that gives more
>nps which others do not search.
>
>If i nullmove a bit less in diep i need for each ply loads of nodes
>less.
>
>So nodes aren't comparable even.
>
>You know that, i know that.
>
>Now try to add some attacks to the evaluation!

I used to do attacks in evaluate.  They are easy to do.  I didn't like the
result.  You simply have to look thru the comments in main.c to see when I
tried this and when I took it out.

You have the mistaken opinion that some things are simply not possible in
bitmaps.  I don't know where that comes from, but it does _not_ come from
reality.

We went thru this once for the Cray and "quality mobility".  I explained how
I did in in Cray Blitz after you said "it is impossible."  You then said "oh"
and that ended the conversation.  Don't say something can't be done until you
have tried it.  You spend too much time "thinking inside the box" which is a
mistake.


>
>I'm not sure tiger is doing this, but it sure is attacking weak pawns :)
>
>It works brilliant. Even doing it in a preprocessor is better as NOT
>doing it!
>
>Crafty NPS will go down factor 3 if you add that to evaluation.
>And just like me i bet you don't want to do it in evaluation and not
>in preprocessor.
>
>Preprocessing sucks.
>
>Crafty would play hell better if it had attacks in evaluation.
>You might have known it, i definitely know it. But we both know
>it's impossible to do this quickly with bitboards...

We both don't know it at all, since earlier versions of my program did it.  I
started off with nothing but pure mobility for all pieces, then added in a
gnu-like evaluation for attacked pieces.  Was not hard.  Was not slow.  But
I didn't like the result and decided to evaluate things in other ways and let
the search take care of multiple attacks.



>
>>
>>>
>>>>you have to think "data density".  Which is what made our move generator and
>>>>some evaluations on the Cray so very fast.  But not until you start "thinking
>>>>outside the box" and asking "how can this architecture help me do things more
>>>>efficiently?" rather than "How can I make my program run on this architecture?"
>>>
>>>>Those are two vastly different questions.  With vastly different answers.
>>>
>>>Here we all agree, but for me putting things in general in a bitboard
>>>means i have 1 bit worth of information about something. That's too little
>>>information for me!
>>>
>>>So if i ever go use 64 bits then it's for those parts of my program where
>>>i can profit.
>>>
>>>What i still do not understand is why crafty is so slow on 64 bits sun
>>>machine. Everything is 64 bits there. For me a sun processor performs
>>>the same as a PII processor at the same speed.
>>
>>First, the sun is a piece of trash.  Even the current operating systems are
>>not "64 bits".  The sparc is a "psuedo-64bit machine" that is worthless.  Try
>>a real 64 bit machine from MIPS, HP, Compaq, Cray, etc. to see what a _real_
>>64 bit machine can do.  Don't base your opinion on a piece of trash.
>>
>>
>>
>>>
>>>How is crafty performing at it?
>>>
>>>IMHO the only reason why crafty isn't so slow on intel processors is
>>>because you have way less branches as i have in DIEP.
>>
>>No... it is because I can keep both scalar pipes busy a lot more.
>>
>>
>>>
>>>I am using complex patterns in DIEP's evaluation function, so everywhere
>>>there are branches. In crafty the branches that are there are easier
>>>to predict.
>>
>>Want me to count the branches in my evaluation function?  They blow out the
>>branch target buffer quickly.
>>
>>
>>
>>>
>>>So at 64 bits processors where the branch misprediction penalty isn't
>>>big, there crafty will perform very bad compared to DIEP.
>>
>>There is no such architecture, however.
>>
>>>
>>>At an alpha the branch misprediction penalty is HUGE.
>>>
>>>So at a 21164 processor which had only 8kb L1 cache i did very bad with
>>>DIEP. At a 4 processor 21264 processor DIEP probably is doing worse as on
>>>a dual K7 1.5Ghz palomino which gets released next month hopefully.
>>>Operating at 1.5Ghz.
>>>
>>>Now the branch misprediction penalty of the K7 isn't very good. But
>>>the 21264 it's way worse. What 21264 has to prevent mispredictions
>>>a bit is a clever system of 2 killertables that reference each other.
>>>
>>>However the size of those tables is so small compared to the number of
>>>branches in DIEP, that it won't speed me up a single %!
>>>
>>>A 633Mhz 21164 processor performed for DIEP as a 380Mhz PII.
>>
>>If that is true, you have _serious_ program design problems.  It should never
>>be slower.  The processor is better.
>>
>>
>>
>>
>>>
>>>Now that was some time ago of course. Nowadays DIEP would do worse on
>>>that 21164 when compared to the PII, because code+data size has become
>>>bigger.
>>>
>>>A similar thing is what i have in mind for the 21264. Where on paper
>>>it can do very good because it does 4 instructions a clock, it's very
>>>likely that a K7 of same Mhz is completely outgunning it for me.
>>>
>>>For crafty the 21264 is not so bad as it is for me,
>>>because you relatively suffer less
>>>from the HUGE branch misprediction
>>>penalty as i do with DIEP!
>>>
>>>It's not that the 64 bits datastructure is so good at 64 bits, the
>>>problem is that the 21264 design has a serious problem with branches!
>>>
>>>Best regards,
>>>Vincent
>>>
>>
>>
>>No.  It is _both_.  However, I have far too many branches for the prediction
>>stuff to work well, so I take the same hits.  I get killed on the xeons for
>>this same reason, as it is easy to count the mispredictions using the machine-
>>specific registers built in to the cpu.
>>
>>
>>
>>>
>>>
>>>Best regards,
>>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.