Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Opteron vs. XP

Author: Vincent Diepeveen

Date: 13:04:07 06/30/03

Go up one level in this thread


On June 29, 2003 at 23:44:42, Robert Hyatt wrote:

>On June 29, 2003 at 18:08:31, Vincent Diepeveen wrote:
>
>>On June 29, 2003 at 11:27:56, Sune Fischer wrote:
>>
>>>On June 29, 2003 at 11:16:51, Jay Urbanski wrote:
>>>
>>>>On June 29, 2003 at 10:45:25, Sune Fischer wrote:
>>>>
>>>>>Right, but why is this interesting?
>>>>>Honestly, to compile Crafty with a 32-bit compiler for a 64 bit chip smells like
>>>>>incompetence to me.
>>>>>
>>>>>Finally the 32-bit hell is over, for good!! :)
>>>>
>>>>>-S.
>>>>
>>>>It's interesting because the vast majority of chess engines are 32-bit binaries
>>>>without any source code provided.  Granted if you're running Crafty you'd be
>>>>silly to compile it as 32-bit; but most other engines don't provide you that
>>>>option.
>>>>
>>>>It will be several years before we see commercial 64-bit engines for Opteron.
>>>>We may never see them for Itanium.
>>>
>>>Depends on what you mean by 64-bit, I don't expect the non-bitboarders to
>>>switch, but I'd certainly expect them to make use of it other ways, like simply
>>>recompiling to 64-bit and coding for the extra registers.
>>
>>this is nonsense of course. First of all intel will be releasing x86-64 cpu's
>>themselves. So what runs at opteron, will run in future at intel 64 bits cpu's
>>too (don't confuse it with the itanium line which IMHO has failed for the low
>>end market unless they can make them 10 times cheaper than they are now and less
>>buggy and clock them 2 times higher).
>>
>>apart from some trivial advantages like you can do some pawnboards in 64 bits
>>and 64 bits adressing, which will give some speedup and zobrist hashing that
>>goes a lot easier, there is other advantages.
>>
>>Take the huge BTB at the opteron.
>>
>>BTB you ask?
>>
>>yes checkout the docs. it stands for branch target buffer. It's like 8 times
>>that of the K7.
>>
>
>Size of the BTB is not _nearly_ so important as how it is implemented. Intel
>has a really cute approach in the PIV and it works exceptionally well.  I think
>the idea came from a researcher somewhere in Texas (Maybe Rice but I am not
>certain).   I'd take the new BTB design at 1/4 the size compared to traditional
>BTB design, without thinking about it very long...

There is no traditional BTB design at all in the Opteron. What is used is the
cute alpha 21264 solution.

>
>
>>L2 cache is 1MB.
>>
>>Though this would normally be not that relevant, it is relevant now, because
>>the cpu's get so much faster now that L1 cache can't do everything for you
>>anymore.
>>
>>Therefore the improved latency is also great. It's like 3 times faster that
>>latency compared to what a dual K7-MP or dual P4-Xeon.
>
>That's one I will wait to actually measure.  I don't personally believe 40ns
>is doable, where current machines are hitting 120ns unless using registered
>DRAM.

You are *not* hitting 120 ns. You need more like 380 ns for a *random* access
into the memory (like for example hashtable). The 120 ns is based upon
sequential stuff and as we both know chess is not like that. The exact technical
working i do not know 100% of the ram, but it is like this. it has some lines
opened. If you keep reasking from that same line then it is faster than when it
has to close it and reopen another.

the access times of the dual Xeon as i measured them and dual MPs are more near
400 ns. Using my own software of course. i did some hard work there making
a bench. amazingly it is also pretty accurate at the x86 hardware.

The fastest timings (no surprise) are single cpu solutions. Those go to around
280 ns to get a random lookup. Both P4 and K7s. Of course i didn't measure
overclocked stuff.

the lookup in all cases is 8 bytes by the way. I just want to know *latency*.

No bandwidth measurements.

We are in computerchess land, not living in streamland.

Trivially opteron won't be 40ns there. Also not the 90 ns for global shared
memory as they quote for duals. that's only for sequential stuff. If it is
in reality like 150 ns, then it would be already 2.5 times faster than any other
x86 dual, where AMD claimed 20% faster latencies in fact before launching
opteron.

Leads me to the question when you go buy a dual opteron?

It gets you like 4 million nodes a second in 64 bits against 2.xx million now
with dual Xeon.

>
>>
>>Then there is a lot of small improvements at the chip which are very important.
>>
>>But you miss one of the biggest improvements. For some reason most forget to
>>mention it. For years we had to do with just 8 stupid registers which get
>>swapped away and so on. Now we get 16 registers.
>>
>>That's a *major* improvement.
>>
>>Of course 128 GPR registers is better, but opteron will always be higher clocked
>>than any cpu having 128 general registers.
>
>I have no idea why you would make that statement.  number of registers is
>_not_ related to clock speed in any real way.
>
>
>>
>>I don't want to get into the itanium versus itanic discussions (whether the
>>predication of the itanium is good or not) because coming tuesday i get a big
>>presentation of the altix3000 system which is just installed at SARA.
>>
>>I prefer showing up with 500Mhz processors though at world champs 2003 unless i
>>have a very hard proof that Itanium is faster for me than that (500 mips versus
>>64 itaniums).
>>
>>So far i didn't see any itanium system capable of more than 64 cpu's in 1
>>partition or node (SGI nowadays calls a partition a 'node'. And a node is called
>>a calculation module or something; trivially that is because 99% of all
>>scientists do not know the difference between nodes at a cc-NUMA machine versus
>>nodes at clusters).
>>
>>That 16 registers *will* speed me up *bigtime* at opteron. I would be amazed if
>>it doesn't speedup others a lot.
>>
>>>This would not take years, it will happen as soon as a significant portion of
>>>their customers own them.
>>>
>>>You know, if there ever was a programmer who cared about the speed of his code,
>>>then that would be a chess programmer. :)
>>>
>>>-S.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.