Author: Robert Hyatt
Date: 08:54:42 05/25/04
Go up one level in this thread
On May 25, 2004 at 10:12:00, Anthony Cozzie wrote: >>>>Yes. I have inlined FirstOne()/LastOne() to use the 64 bit AMD BSF/BSR >>>>instructions. There were several changes dealing with updating global shared >>>>data that were made to cut down on cache-to-cache (MOESI) transactions and >>>>overhead. There were changes made to make local data be allocated in a >>>>processor's local memory to decrease access time. Etc... > >I know cache-cache is traditionally slow, but I thought that on the opteron it >was fast, what with the hypertransport link and all. I need to review my >multiprocessor notes, but it seems logical that > >CPU -> CPU >= CPU -> CPU -> MEMORY > >anthony The problem is the "S" state is the MOESI protocol. If all processors are updating a memory address that is in the same "line" then the cache controllers get into a real snit. For infrequent shared updates, MOESI eliminates the cache to memory updates (cache-invalidate-reload) and replaces it by cache-cache transactions. But, with high-volume updates, the cache-cache updates begin to cause problems as the simple cache-cache bus gets swamped... IE the first problem we found was a global counter I update once per node to determine when to do a "time check" and a "input check". That caused horrible bottlenecks that were eventually fixed. There were other such examples of sharing things that could, with some effort, be made local with periodic points of "combination". Every time a "shared" value was removed on the AMD opteron, performance went up. By "shared" I mean a 64 byte chunk of memory that gets modified by more than one processor within a short period of time... PIV gets hurt because L2 cache lines are 2x longer still, making the interference even more likely...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.