Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Opteron Instruction Set

Author: Robert Hyatt

Date: 15:30:07 02/03/04

Go up one level in this thread


On February 03, 2004 at 16:18:46, Robert Hyatt wrote:

>On February 03, 2004 at 15:49:27, Vincent Diepeveen wrote:
>
>>On February 03, 2004 at 12:15:23, Robert Hyatt wrote:
>>
>>>On February 03, 2004 at 11:45:20, Vincent Diepeveen wrote:
>>>
>>>>On February 03, 2004 at 03:13:29, Gerd Isenberg wrote:
>>>>
>>>>>On February 03, 2004 at 01:03:29, Jay Urbanski wrote:
>>>>>
>>>>>>On February 02, 2004 at 22:41:19, Robert Hyatt wrote:
>>>>>>
>>>>>>>On February 02, 2004 at 20:06:29, David Rasmussen wrote:
>>>>>>>
>>>>>>>>Does the Opteron have firstBit, lastBit and popCount instructions? Or at least
>>>>>>>>something that makes calculating them easier than on x86-32?
>>>>>>>>
>>>>>>>>/David
>>>>>>>
>>>>>>>
>>>>>>>Has the same BSF/BSR instructions, but no popcnt that I have found.  Note
>>>>>>>that BSF/BSR work on 64 bit values if you want.  I have inline asm to do
>>>>>>>all three for gcc if you are interested.
>>>>>>
>>>>>>I understand there is a popcount instruction.  I also understand it's
>>>>>>undocumented.
>>>>>
>>>>>Do you have any opcode or further hints?
>>>>>That would be great - a 4 cycle vector path popcount ;-)
>>>>
>>>>And deadslow.
>>>
>>>
>>>Certainly not slower than what we have to do at present...
>>
>>Yes it is slower, because no one ever thought of it in bitboards to write such
>>stuff incremental.
>>
>>No popcnt's needed then.
>>
>>In fact majority of crafty's simple eval you can write incremental and it's way
>>way faster.
>
>No it isn't.  Again, why don't you look first and understand, and _then_ make
>make comments with some information to back you up?
>
>Look at where I use PopCnt().  Hint:  It is _not_ for computing mobility.  That
>is a simple table lookup for me with no popcnt needed.
>
>
>>
>>Note that i'm not doing evaluation incremental (some datastructures i do) in
>>DIEP, because i am busy making a huge great evaluation function.
>>
>>Readability and portability above anything else!
>>
>>mixing 8 unsigned bits arrays such as several solutions for BSF/BSR replacements
>>at opteron use with signed ints with unsigned long long mixed with unsigned int.
>>I find it all very bad to do.
>
>Opteron needs one instruction.  I don't know what you are talking about..
>
>>
>>It's trivial that i could get diep easily 10% faster at opteron by rewriting all
>>'int' arrays to 'unsigned int'. In that case at several spots in the program
>>unsigned int gets mixed with signed.
>>
>>I find that detestable however.
>>
>>The same logics applies here to doing entire evaluation incremental in
>>bitboards. You can throw away most of your bitboard logics of course as
>>incremental stuff goes faster in non bitboards, but with or without bitboards,
>>doing it incremental is *way* faster and you can avoid expensive stuff like pop
>>counts.
>
>Why don't you look where they are done.  Again, hint:  "behind the pawn hashing"
>so there is no cost to speak of.  Look first, comment last.  Not vice versa.
>
>BTW population counts are _not_ "way expensive".  It doesn't even show up on
>profiling, generally...  last time it was here:
>
>  0.11     88.79     0.10   359435     0.00     0.00  HistoryRefutation
>  0.10     88.88     0.09   228891     0.00     0.00  InterposeSquares
>  0.09     88.96     0.08                             PopCnt
>
>
>That is a "whopping" .09%.  And yes I mean .09% _not_ 9%.
>
>Do you _ever_ get anything right nowadays???


Aw rats.  My math was bad.  I should have used Diepeveen math.  Then reducing
the time spent in PopCnt() could double my NPS and get me 2-3 more plies of
search.  I keep forgetting that if I totally eliminate that .09% overhead, my
NPS will double.  Regardless of what my calculator says...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.