Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: SURPRISING RESULTS P4 Xeon dual 2.8Ghz

Author: Matt Taylor

Date: 12:38:11 12/17/02

On December 17, 2002 at 15:32:43, Gian-Carlo Pascutto wrote:

>On December 17, 2002 at 14:47:43, Matt Taylor wrote:
>
>>On December 17, 2002 at 14:38:22, Gian-Carlo Pascutto wrote:
>>
>>>On December 17, 2002 at 12:50:48, Matt Taylor wrote:
>>>
>>>>Actually it doesn't work like that. The CPU has an existing bandwidth of 3
>>>>micro-ops/cycle.
>>>
>>>I was under the impression the P4 was much more limited than that
>>>(don't remember the details though).
>>
>>1 micro-op/cycle from the decoder to the trace cache, 3 micro-ops/cycle from the
>>trace cache to the execution units. They get around the former limitation by
>>caching the decoded output.
>
>Yes, this is what I remembered indeed. 8000 microops is going to be
>a *tight* space to fit an entire chessprogram in.
>
>If you're running out of that cache, there's no way to make use of the
>HT, is there?

Yes and no. It depends on a lot of things. One obvious optimization involves
threading. If you have 2 threads, they probably share the same code base, and
they stand to benefit since they share that cache. It wouldn't create a lot of
extra contention; rather the opposite -- you already have the micro-ops decoded,
and you get parts of the second thread for free.

If you run two seperate threads, they'll compete for the trace cache, small 8K
L1 data cache, decoder, and other limited resources. You won't gain in this
case, but you don't really stand to lose until you start evicting trace cache
code that you need.

It's actually a 12K micro-op cache, and micro-ops correlate somewhere between
1-2 + memory per average x86 instruction. (If the instruction uses memory, you
have an extra micro-op for the load or store.)

>>>>Now, I am no parallel researcher, but even my parallel code doesn't suffer
>>>>overheads so large that it can't gain from HT.
>>>
>>>Depends on what the problem is.
>>
>>You mean it depends on whether or not it's a parallel problem.
>
>Some problems are harder to parallelize with good efficiency.
>
>--
>GCP

Problems that are mostly serial will not parallelize efficiently.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.