Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Microsoft Visual C++ Toolkit 2003

Author: Vincent Diepeveen

Date: 06:09:09 12/25/05

Go up one level in this thread


On December 23, 2005 at 16:25:16, William Kerr wrote:

>Hi All,
>
>Just some results from using Microsofts free Visual C++ Toolkit 2003. I have a
>test programm called TREE.cpp which is a tree searching program that uses
>alpha/beta, killer heuristic and iterative deepening to search a tree. Very
>simular to chess. In fact I also uses this program in my chess program. I used
>to compile this using Microsoft VC++ 6.0 default settings. I then used
>Microsofts free Visual C++ Toolkit 2003 to compile TREE.cpp and got a 3 to 1
>speed improvement. When using Visual C++ Toolkit 2003 I compile for speed for
>Intel/AMD.
>
>One interesting observation is the number of clock ticks per node. A Intel P4
>3.4 GHz takes 91 clock ticks per node, a 1.73 GHz Centrino takes 40 clock ticks
>per node. The Centrino executes 43,186,000 nodes per second whereas the P4
>3.4GHz executes only 37,325,000 nodes per second. By comparison, a AMD XP3000+
>running at 2.16Ghz executes 44,868,000 nodes per second with 48 clock ticks per
>node.
>
>Something to ponder
>Bill

Very well known indeed that an outdated k7 is already faster than a P4 prescott.

It really matters which core of the P4 you use. The old P4EE 3.2Ghz had a far
better cache subsystem than the cheaper and newer P4 3.4Ghz. Those cpu's are
very complex, so the few things i write down here is not the only reason why the
k7 and k8 are faster.

Easiest is describing the k8 as that one is newer.
The internal bandwidth of the k8 is vastly superior over the p4.
Important to conclude is that the caches are bigger and faster.

Now for such a simple program probably the size of the L1 doesn't matter that
much yet, but for modern chessprograms it does matter a lot.

The P4 has in the P4EE version a 8KB L1 datacache. Versus the opteron (k8) has
64KB L1 datacache. The P4 prescott 3.4ghz has a 16KB L1 cache, however to get 1
element out of L1 cache, eats 4 cycles. You can only get 1 element out of L1
cache at a time.

The K8 can get 2 elements out of L1 cache simultaneously and it eats 3 cycles.

Then branch prediction. Now for a simple chessprogram this will matter, but if
we look to world top 50 it sure matters a hell of a lot more. The branch
prediction penalty at P4 is 20 cycles as a minimum. More like 30+ cycles on
average. At k8 it is not clear what it is, but probably less than 30+ cycles on
average.

To support branches from getting this 'death penalty', sophisticated logics is
in the processor, called branch prediction unit. In intel jargon BTB (branch
target buffer). The size of it in P3 is 512 entries if i remember well. In the
K7 it is 2048 entries. K8 is even 16384 entries. You know, i didn't even lookup
its size for P4. Something real tiny by todays standards. Blind guess is 2048
entries.

Now we move to the L2 cache. The L2 cache of k8 is 1024KB. Whether it's 512KB or
1024KB doesn't matter much for chess, but what matters really a lot is the SPEED
at which it can serve. k7 has a 20 cycle L2 and you can call that WEAK.
K8 has a 13 cycle L2 and that's real real good. Note there is always a
difference so it's nearly never *exactly* 13 cycles. It depends upon what you do
and how. P4 is more like 30+ cycles there for L2 cache.

What the size of L3 cache is or whether it is there, you know, that's
irrelevant, 95+% of all reads you do to L1 cache anyway.
The rest goes to L2.

Now let's touch the next subject. Instruction cache and decoding. You know it's
real sad i mention this. This is a big weakness of all intel cpu's including
itanium2. k8 has instruction cache also onto L2 cache. None of the intel cpu's
AFAIK have this. You know, somewhere end of next year montecito should have
this, if they still go produce that itanium2 cpu. AMD has this already in 2003.
It's not a new invention or anything like that. It's bitter need, even for huge
chessprograms. P4 has a tiny trace cache. It can decode only 1 instruction a
cycle.

You know, you really must see P4 as a processor which can on average execute 1
instruction a cycle at 3.4Ghz.

If you have such huge penalties everywhere for L1, L2 and now i didn't mention
memory even, then what are we talking about you know?

What is real bad of intel, is their habits to put just nonsense on paper. Like
if you search on paper the P4 has a 2 cycle L1 cache. Very clever mentionned "1
extra cycle". I already mentionned prescott has a 4 cycle L1.
Intel didn't tell us that. Testers figured that out by simply benchmarking the
prescott core.

I remember someone who runs on a big supercomputer now with his chessprogram to
in advance predict his program would get 1 million nodes a second hands down at
that itanium2 1.6Ghz.

"4 integer execution units, 6 instructions a cycle"

You know, that's paper.

Arturo Ochoa quoted me:
  "Paper supports everything"

That's intel. On paper they are the greatest.
In reality it's not so great at the moment.

However i believe that in areas like hardware, in the sinus trajectory.

At this moment AMD is faster. Next cpu release end of 2006,
intel is faster again perhaps, as k8 by then is outdated.

Of course please realize k8 is a highend cpu. From intel only itanium2 is a
highend cpu. All P4 and pentium-m's, also pentium-m's in xeon form in future,
they are not real highend.

The pentium-m xeons as announced might be fast single cpu, but
please realize how ugly the L2 sharing is.

If you need a fast L2 cache, then that is an interesting cpu if you run at 2
processes at the same time at such a cpu. How are they gonna let it work hard in
such a case?

Additional the L2 communication from 1 cpu to another (so that's not from core
to core as it just has 1 L2 cache, but for example in a quad machine the
synchronisation mechanism) for memory is real ugly.

So even before launch we already see performance issues in that architecture.

Vincent




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.