Author: Matt Taylor
Date: 10:30:41 12/17/02
Go up one level in this thread
On December 17, 2002 at 13:25:35, Robert Hyatt wrote: >On December 17, 2002 at 13:22:47, Matt Taylor wrote: > >>On December 17, 2002 at 12:09:22, Gian-Carlo Pascutto wrote: >> >>>On December 17, 2002 at 12:03:38, Matt Taylor wrote: >>> >>>>Actually I based it on data that Dr. Hyatt posted previously. The data Vincent >>>>has for his program doesn't show such wonderful gains. >>> >>>Ok - first see my above post; I looked at the wrong log. I don't >>>have data for this exact comparison yet, so you may be right - or not. >>> >>>I don't trust any data that is produced by either of the two >>>so I prefer to run my own tests. >>> >>>>>>but it's been optimized for HT. >>>>> >>>>>It's not - even Robert will tell you this. >>>> >>>>Ok, it's been optimized for Pentium 4, which is -almost- the same thing. If it >>>>runs well on P4, it runs well with HT because it will fall into an I/O burst >>>>cycle. >>> >>>Can you explain this last sentence? >>> >>>-- >>>GCP >> >>I/O burst cycle is a concept from operating systems. Programs do a bit of work >>and then they do some form of I/O. It makes sense; you click your mouse, some >>code determines what you clicked and what to do in response, and then it spits >>it back out at you in the form of output. >> >>You can think of memory accesses also occuring according to an I/O burst cycle, >>though it is less pronounced. Generally, the CPU needs data to be in its >>registers to manipulate it. Code will load its data, say a matrix, do the >>manipulations, and store the results. It's like a miniature I/O burst cycle. >> >>The point I am making is that one thread will be busy doing I/O (which is slow) >>while another thread gets to do real work. They'll alternate like that. Ideally >>you get a 100% speed-up from HT because their cycles "hug" each other -- one >>comes out of I/O just as the other goes into I/O. You don't get 100% speed-up in >>most HT cases because that doesn't happen very often. >> >>I was wrong, however, as I was under the impression that Eugene had put some >>hand-tweaked code into the version that ran those benchmarks. >> >>-Matt > > >Not that I am aware of. He said "standard crafty". I've been plugging with the >pause and have it working, but it was a very minor gain. I started to study why >and realized that my big "time burner" is a spinwait (not a spinlock) and it is >testing several volatile values per cycle. This prevents saturating the pipe >with multiple iterations of the spin loop and taking the out-of-order penalty >that causes... As well as not burning the cpu pipeline horribly since a lot >of cache traffic occurs... Can't you still get gains with pause? You could even try doubling it up. Intel says it's implemented as an actual delay on Xeons. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.