Author: Matt Taylor
Date: 12:38:11 12/17/02
Go up one level in this thread
On December 17, 2002 at 15:32:43, Gian-Carlo Pascutto wrote: >On December 17, 2002 at 14:47:43, Matt Taylor wrote: > >>On December 17, 2002 at 14:38:22, Gian-Carlo Pascutto wrote: >> >>>On December 17, 2002 at 12:50:48, Matt Taylor wrote: >>> >>>>Actually it doesn't work like that. The CPU has an existing bandwidth of 3 >>>>micro-ops/cycle. >>> >>>I was under the impression the P4 was much more limited than that >>>(don't remember the details though). >> >>1 micro-op/cycle from the decoder to the trace cache, 3 micro-ops/cycle from the >>trace cache to the execution units. They get around the former limitation by >>caching the decoded output. > >Yes, this is what I remembered indeed. 8000 microops is going to be >a *tight* space to fit an entire chessprogram in. > >If you're running out of that cache, there's no way to make use of the >HT, is there? Yes and no. It depends on a lot of things. One obvious optimization involves threading. If you have 2 threads, they probably share the same code base, and they stand to benefit since they share that cache. It wouldn't create a lot of extra contention; rather the opposite -- you already have the micro-ops decoded, and you get parts of the second thread for free. If you run two seperate threads, they'll compete for the trace cache, small 8K L1 data cache, decoder, and other limited resources. You won't gain in this case, but you don't really stand to lose until you start evicting trace cache code that you need. It's actually a 12K micro-op cache, and micro-ops correlate somewhere between 1-2 + memory per average x86 instruction. (If the instruction uses memory, you have an extra micro-op for the load or store.) >>>>Now, I am no parallel researcher, but even my parallel code doesn't suffer >>>>overheads so large that it can't gain from HT. >>> >>>Depends on what the problem is. >> >>You mean it depends on whether or not it's a parallel problem. > >Some problems are harder to parallelize with good efficiency. > >-- >GCP Problems that are mostly serial will not parallelize efficiently.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.