Author: Robert Hyatt
Date: 14:36:32 10/17/02
Go up one level in this thread
On October 17, 2002 at 16:42:02, Vincent Diepeveen wrote:
>On October 17, 2002 at 14:56:43, Robert Hyatt wrote:
>
>>On October 17, 2002 at 12:01:37, Vincent Diepeveen wrote:
>>
>>>On October 17, 2002 at 06:20:11, Uri Blass wrote:
>>>
>>>>Today my repetition detection is not done based on hash tables and I plan to do
>>>>it faster by hash tables.
>>>
>>>Why would this be faster, taking x86 cpu architecture into account?
>>>
>>>
>>>>After every makemove I calculate the new hash key for the hash tables
>>>
>>>You don't incrementally calculate the zobrist hashing yet by just
>>>xor-ing the pieces you moved to the hashkey?
>>>
>>>>but I do not have an array of all the hash keys and I use a global varaible
>>>>__int64 zob to have the hash key.
>>>
>>>i'm using for the x86 architecture 2 x 'unsigned int' for the hashkey.
>>>The reason is that it was faster than a single 'unsigned _int64'.
>>>
>>>Compilers not so efficient yet, though intel c++ might be doing this
>>>more efficient than others :)
>>>
>>>>I plan to add an array zobkey[max_plies_of_game] for hash keys
>>>>My question is what is faster:
>>>
>>>>1)Doing all the calculation on zob and after finishing them to do
>>>>zobkey[hply]=zob;
>>>
>>>this is by far fastest of course.
>>
>>No it isn't, if zob is global. you have to update _both_ zob and zobkey[hply]
>>in
>>memory. If zob is local, the compiler might be able to keep it in a register
>>for its
>>scope and never do a write to memory until you write the value to zobkey...
>
>there is no real difference between local and global unless you write to
>other cache lines before doing all modifications to the global zob.
There is a _huge_ difference.
Example:
local int i;
for (i=0;i<n;i++) {
code using i
}
The compiler may _never_ use memory at all. i can legally be placed in a
register
and kept there, and if the dependency graph shows that i is never used in this
procedure
after the above loop, there is no reason to take the time to write it back to
memory.
Cost = 0
if int i is outside the procedure, so that it is global, then the compiler
_must_ write the
final value back to i (n) because it can't know whether i is used in other
procedures or not.
Cost = 1 memory write, several hundred clock cycles + lost bandwidth caused by
that write.
>
>To keep it in a single register (which would be only advantage to do
>it local) you cannot put the makemove() function in a special function
>at all (or you have to inline it everywhere).
>
>That's not neat programming.
Look at his code. The scalar value _could_ be local, which avoids writing it
back to memory.
if it is global, that won't happen and the cost goes up. If it needs to be
global, of course, then
you are stuck. But local vs global is a significant issue to a compiler's
optimizer...
>
>>
>>>
>>>>2)Doing all the calculations on zobkey[hply]
>>>
>>>extra array references cause extra instructions such as
>>>indirect accessing the array by [EAX].
>>>
>>>Way faster is all operations onto a single register.
>>>
>>
>>
>>
>>However that has nothing to do with his question. You load the value _once_ and
>>then
>>fiddle with it in a register. Then you store it back once.
>
>you have to start with a value already. So the choice is simple.
>Either you start with a value from an array (slow), or you do stuff
>to a global variable, which means it gets stored only after you modify
>other cache lines.
Other cache lines have nothing whatsoever to do with this. You keep mixing up
NUMA memory issues with everything. This isn't a NUMA or SMP issue. It is
a pure optimization issue...
>
>>
>>
>>
>>>>I guess that I am going to choose 1 because it is more simple and I guess that
>>>>the difference in speed is less than 0.1% but I am interested to know what is
>>>>faster.
>>>
>>>Well it should take very little system time in total anyway, but
>>>working on 1 global variable is always faster than doing it by using
>>>arrays.
>>
>>
>>Vincent, this is simply wrong. You make a gross statement that has more
>>exceptions than cases that it follows.
>
>>For example:
>
>>int x[2]; where that is a local array. It is more than possible that the two
>>elements get put into registers and _no_ memory reads/writes are done at all.
>>If you work with a _global_ value, the compiler loses some of its ability to
>>optimize away writes back to the value. For local data, the compiler produces
>
>No it doesn't Bob. As you could know the processor only stores cache line
>after you do writes to other cache lines.
wrong. I don't have to do writes to other cache lines. I can do _reads_ to
other
cache lines causing a "dirty line" to be written back to memory and removed from
cache.
what are you talking about???
>
>The question raised was whether he had to use arrays or a global variable.
>In that case the answer is a global variable.
I still don't follow. Arrays can be global. Scalars can be global.
Global variables hurt optimization...
>
>Of course i'm using local variables everywhere, but the principle is the
>same simply that it's faster than fiddling with arrays.
>
>If the question is whether referencing indirect is faster versus direct
>reference, then the question is direct reference.
>
>That was the raised question.
That wasn't the question I was talking about. He asked was it better to do the
computations in a scalar and later store the result into an array. I said "not
if
the scalar is global" because the writes to the scalar have to be done, _and_
the
write to the array has to be done. If the scalar is local, the write might be
optimized away...
>
>Apart from that global variables are only needed to put in local variables
>when you fiddle with a lot of different stuff at the same time, that is
>do writes to other cache lines *before* all modifications to the global
>variable are done.
Wrong concept of "local". We are not talking about "in local memory" or "in
local
cache". This is just plain old ANSI-C local / global scope of a variable...
>
>int a; // global
>
>function() {
> ..
> a ^= movelookup[move&mask];
> a ^= iscapture[move&capturemask];
> ..
> other code
> ..
>}
>
>No way to beat that with a local variable.
Sure there is. Just make it local and notice that the compiler might not even
do a memory write...
If it can keep the value in a register while it is being used, no memory write
will
be done. If it is global, it gets written whether it is ever used again or
not...
>
>>a dependency graph and knows when a value gets written back but never used
>>again, and it eliminates the write back. For global variables this can't be
>>done
>>because the compiler can't tell who might use that value later on after this
>>procedure has returned...
>>
>>
>>
>>
>>
>>>
>>>>Doing all the calculations on zobkey[hply] seems to have one less arithmetic
>>>>calculation but more array calls.
>>>
>>>Arithmetic is very cheap (exception: BSF and BSR vector instructions)
>>>
>>>In general you should assume in the future that processors (take the
>>>mckinley as example) will do more instructions in either a bundle or
>>>within a single clock. Memory will get slower. So instructions that
>>>act upon a single register will be very slow unless it is complex
>>>instructions like BSF. Even multiplying i am using scrupuleous above
>>>adding a single small local array!
>>>
>>>a hashtable is way slower because it eats more memory than a single
>>>array [hply]. That O(1) lookup is of course way slower than doing
>>>a lookup in that array with hashnumbers which is already inside
>>>perhaps even your L1 cache already.
>>>
>>>Also in order to get a hashtable correctly to work you need a linked
>>>list hashtable. At paper that sounds cool perhaps, but it is hell slow.
>>>
>>>>Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.