Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: question about speed in hash tables

Author: Vincent Diepeveen
Date: 11:35:40 10/18/02
On October 17, 2002 at 17:36:32, Robert Hyatt wrote:

There *can* be a difference, but there doesn't need to be a
difference :)

The important lesson is to avoid toying continuesly with
global arrays!

Also linked hashtables, though they sound nice and cool, they
suck ass for computerchess!!

>On October 17, 2002 at 16:42:02, Vincent Diepeveen wrote:
>
>>On October 17, 2002 at 14:56:43, Robert Hyatt wrote:
>>
>>>On October 17, 2002 at 12:01:37, Vincent Diepeveen wrote:
>>>
>>>>On October 17, 2002 at 06:20:11, Uri Blass wrote:
>>>>
>>>>>Today my repetition detection is not done based on hash tables and I plan to do
>>>>>it faster by hash tables.
>>>>
>>>>Why would this be faster, taking x86 cpu architecture into account?
>>>>
>>>>
>>>>>After every makemove I calculate the new hash key for the hash tables
>>>>
>>>>You don't incrementally calculate the zobrist hashing yet by just
>>>>xor-ing the pieces you moved to the hashkey?
>>>>
>>>>>but I do not have an array of all the hash keys and I use a global varaible
>>>>>__int64 zob to have the hash key.
>>>>
>>>>i'm using for the x86 architecture  2 x 'unsigned int' for the hashkey.
>>>>The reason is that it was faster than a single 'unsigned _int64'.
>>>>
>>>>Compilers not so efficient yet, though intel c++ might be doing this
>>>>more efficient than others :)
>>>>
>>>>>I plan to add an array zobkey[max_plies_of_game] for hash keys
>>>>>My question is what is faster:
>>>>
>>>>>1)Doing all the calculation on zob and after finishing them to do
>>>>>zobkey[hply]=zob;
>>>>
>>>>this is by far fastest of course.
>>>
>>>No it isn't, if zob is global.  you have to update _both_ zob and zobkey[hply]
>>>in
>>>memory.  If zob is local, the compiler might be able to keep it in a register
>>>for its
>>>scope and never do a write to memory until you write the value to zobkey...
>>
>>there is no real difference between local and global unless you write to
>>other cache lines before doing all modifications to the global zob.
>
>There is a _huge_ difference.
>
>Example:
>
>local   int i;
>
>for (i=0;i<n;i++) {
>
>code using i
>
>}
>
>The compiler may _never_ use memory at all.  i can legally be placed in a
>register
>and kept there, and if the dependency graph shows that i is never used in this
>procedure
>after the above loop, there is no reason to take the time to write it back to
>memory.
>
>Cost = 0
>
>if int i is outside the procedure, so that it is global, then the compiler
>_must_ write the
>final value back to i (n) because it can't know whether i is used in other
>procedures or not.
>
>Cost = 1 memory write, several hundred clock cycles + lost bandwidth caused by
>that write.
>
>>
>>To keep it in a single register (which would be only advantage to do
>>it local) you cannot put the makemove() function in a special function
>>at all (or you have to inline it everywhere).
>>
>>That's not neat programming.
>
>Look at his code.  The scalar value _could_ be local, which avoids writing it
>back to memory.
>if it is global, that won't happen and the cost goes up.  If it needs to be
>global, of course, then
>you are stuck.  But local vs global is a significant issue to a compiler's
>optimizer...
>
>>
>>>
>>>>
>>>>>2)Doing all the calculations on zobkey[hply]
>>>>
>>>>extra array references cause extra instructions such as
>>>>indirect accessing the array by [EAX].
>>>>
>>>>Way faster is all operations onto a single register.
>>>>
>>>
>>>
>>>
>>>However that has nothing to do with his question.  You load the value _once_ and
>>>then
>>>fiddle with it in a register.  Then you store it back once.
>>
>>you have to start with a value already. So the choice is simple.
>>Either you start with a value from an array (slow), or you do stuff
>>to a global variable, which means it gets stored only after you modify
>>other cache lines.
>
>Other cache lines have nothing whatsoever to do with this.  You keep mixing up
>NUMA memory issues with everything.  This isn't a NUMA or SMP issue.  It is
>a pure optimization issue...
>
>
>
>>
>>>
>>>
>>>
>>>>>I guess that I am going to choose 1 because it is more simple and I guess that
>>>>>the difference in speed is less than 0.1% but I am interested to know what is
>>>>>faster.
>>>>
>>>>Well it should take very little system time in total anyway, but
>>>>working on 1 global variable is always faster than doing it by using
>>>>arrays.
>>>
>>>
>>>Vincent, this is simply wrong.  You make a gross statement that has more
>>>exceptions than cases that it follows.
>>
>>>For example:
>>
>>>int x[2]; where that is a local array.  It is more than possible that the two
>>>elements get put into registers and _no_ memory reads/writes are done at all.
>>>If you work with a _global_ value, the compiler loses some of its ability to
>>>optimize away writes back to the value.  For local data, the compiler produces
>>
>>No it doesn't Bob. As you could know the processor only stores cache line
>>after you do writes to other cache lines.
>
>wrong.  I don't have to do writes to other cache lines.  I can do _reads_ to
>other
>cache lines causing a "dirty line" to be written back to memory and removed from
>cache.
>
>what are you talking about???
>
>
>>
>>The question raised was whether he had to use arrays or a global variable.
>>In that case the answer is a global variable.
>
>I still don't follow.  Arrays can be global.  Scalars can be global.
>
>Global variables hurt optimization...
>
>
>>
>>Of course i'm using local variables everywhere, but the principle is the
>>same simply that it's faster than fiddling with arrays.
>>
>>If the question is whether referencing indirect is faster versus direct
>>reference, then the question is direct reference.
>>
>>That was the raised question.
>
>That wasn't the question I was talking about.  He asked was it better to do the
>computations in a scalar and later store the result into an array.  I said "not
>if
>the scalar is global" because the writes to the scalar have to be done, _and_
>the
>write to the array has to be done.  If the scalar is local, the write might be
>optimized away...
>
>
>>
>>Apart from that global variables are only needed to put in local variables
>>when you fiddle with a lot of different stuff at the same time, that is
>>do writes to other cache lines *before* all modifications to the global
>>variable are done.
>
>Wrong concept of "local".  We are not talking about "in local memory" or "in
>local
>cache".  This is just plain old ANSI-C local / global scope of a variable...
>
>
>>
>>int a;  // global
>>
>>function() {
>>  ..
>>  a ^= movelookup[move&mask];
>>  a ^= iscapture[move&capturemask];
>>  ..
>>  other code
>>  ..
>>}
>>
>>No way to beat that with a local variable.
>
>
>Sure there is.  Just make it local and notice that the compiler might not even
>do a memory write...
>
>If it can keep the value in a register while it is being used, no memory write
>will
>be done.  If it is global, it gets written whether it is ever used again or
>not...
>
>>
>>>a dependency graph and knows when a value gets written back but never used
>>>again, and it eliminates the write back.  For global variables this can't be
>>>done
>>>because the compiler can't tell who might use that value later on after this
>>>procedure has returned...
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>>Doing all the calculations on zobkey[hply] seems to have one less arithmetic
>>>>>calculation but more array calls.
>>>>
>>>>Arithmetic is very cheap (exception: BSF and BSR vector instructions)
>>>>
>>>>In general you should assume in the future that processors (take the
>>>>mckinley as example) will do more instructions in either a bundle or
>>>>within a single clock. Memory will get slower. So instructions that
>>>>act upon a single register will be very slow unless it is complex
>>>>instructions like BSF. Even multiplying i am using scrupuleous above
>>>>adding a single small local array!
>>>>
>>>>a hashtable is way slower because it eats more memory than a single
>>>>array [hply]. That O(1) lookup is of course way slower than doing
>>>>a lookup in that array with hashnumbers which is already inside
>>>>perhaps even your L1 cache already.
>>>>
>>>>Also in order to get a hashtable correctly to work you need a linked
>>>>list hashtable. At paper that sounds cool perhaps, but it is hell slow.
>>>>
>>>>>Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.