Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: FINAL ANSWER

Author: Robert Hyatt

Date: 12:57:12 12/10/03

Go up one level in this thread


On December 10, 2003 at 15:00:18, Gerd Isenberg wrote:

>On December 10, 2003 at 09:08:05, Robert Hyatt wrote:
>
>>On December 10, 2003 at 01:10:23, Russell Reagan wrote:
>>
>>>On December 10, 2003 at 00:20:35, Slater Wold wrote:
>>>
>>>>144 - SuSe 8 - gcc 33 -m32 = 1109
>>>>144 - SuSe 8 - gcc 33 -m64 = 1562
>>>>
>>>>41% going from 32 to 64 bit on Crafty!
>>>>
>>>>And others:
>>>>
>>>>144 - SuSe 8 - ICC 7.0 (32)= 1199
>>>>144 - W2003E - ICC 7.0 (32)= 1230
>>>
>>>I think there are more questions to answer. One is the one you just answered,
>>>which is how much of a speedup we can from the 64-bit compilation alone. Another
>>>is how much of a speedup we get from the Opteron's hardware (ex. 32-bit Athlon
>>>vs. 64-bit Athlon/Opteron).
>>>
>>>Another is how much of a speedup non-bitboard programs will get from the 64-bit
>>>hardware and 64-bit compilation. Maybe someone could compile some non-bitboard
>>>programs. I guess even TSCP's bench command might give us some answers.
>>>
>>>One question I have is, does the 32-bit gcc compilation on 64-bit hardware still
>>>take advantage of all 16 general purpose registers? Or does it compile it for a
>>>32-bit executable you could run on a 32-bit CPU?
>>
>>
>>When you specify -m32, you get an X86 executable, which means no unusual
>>registers or anything.  -m64 (default on the box I am testing on) adds
>>both 64 bit registers and the extra 8 registers %r8-%r15...
>
>Yes, with x86-32 we have usually six or seven 32-bit registers,
>eax,ebx,ecx,edx,esi,edi,(ebp), keeping up to three bitboards (most likely only
>two). With x86-64 there are theoretically up to 15 32-bit as well as bitboard
>registers - five times more registers for bitboards ;-)
>
>The drawback are additional instruction prefixes and 64-bit long addresses.
>Therefore even longer direct data access instructions and doubled memory space
>for storing pointer or references. I would prefere a tiny 32/64-bit mode with
>32-bit addresses but all registers, with prefix 64-bit wide.

Not there.  In fact, you can't even do a bsf %r8, %eax to get a 32 bit
counter for a 64 bit value.  That surprised me, since bsf is not going
to produce a result > 8 bits anyway.  :)

I have not investigated (yet) what happens if you load a 32 bit value
into a 32 bit register, then use the 64 bit register.  Do the upper
32 bits get clobbered (expected) or left alone as in using ah/al in
8 bit land?

In any case, the box is interesting, and I'm drowning in details.  :)

IE linux memory management is interesting.  You want the local stack to
be in the memory attached to the processor you run on.  But you have to
malloc() the stack before creating the thread.  But it turns out that
malloc doesn't fill in page tables, that happens when the pages are
first referenced, and I can tell linux "allocate local when possible"
so that the stack faults into the local memory on the processor running
that thread.  now I have to make sure to glue that thread to that
processor.

It's loads of fun.  :)

And I don't want to break non-NUMA SMP search either. :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.