Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SURPRISING RESULTS P4 Xeon dual 2.8Ghz

Author: Robert Hyatt

Date: 10:48:52 12/17/02

Go up one level in this thread


On December 17, 2002 at 13:30:41, Matt Taylor wrote:

>On December 17, 2002 at 13:25:35, Robert Hyatt wrote:
>
>>On December 17, 2002 at 13:22:47, Matt Taylor wrote:
>>
>>>On December 17, 2002 at 12:09:22, Gian-Carlo Pascutto wrote:
>>>
>>>>On December 17, 2002 at 12:03:38, Matt Taylor wrote:
>>>>
>>>>>Actually I based it on data that Dr. Hyatt posted previously. The data Vincent
>>>>>has for his program doesn't show such wonderful gains.
>>>>
>>>>Ok - first see my above post; I looked at the wrong log. I don't
>>>>have data for this exact comparison yet, so you may be right - or not.
>>>>
>>>>I don't trust any data that is produced by either of the two
>>>>so I prefer to run my own tests.
>>>>
>>>>>>>but it's been optimized for HT.
>>>>>>
>>>>>>It's not - even Robert will tell you this.
>>>>>
>>>>>Ok, it's been optimized for Pentium 4, which is -almost- the same thing. If it
>>>>>runs well on P4, it runs well with HT because it will fall into an I/O burst
>>>>>cycle.
>>>>
>>>>Can you explain this last sentence?
>>>>
>>>>--
>>>>GCP
>>>
>>>I/O burst cycle is a concept from operating systems. Programs do a bit of work
>>>and then they do some form of I/O. It makes sense; you click your mouse, some
>>>code determines what you clicked and what to do in response, and then it spits
>>>it back out at you in the form of output.
>>>
>>>You can think of memory accesses also occuring according to an I/O burst cycle,
>>>though it is less pronounced. Generally, the CPU needs data to be in its
>>>registers to manipulate it. Code will load its data, say a matrix, do the
>>>manipulations, and store the results. It's like a miniature I/O burst cycle.
>>>
>>>The point I am making is that one thread will be busy doing I/O (which is slow)
>>>while another thread gets to do real work. They'll alternate like that. Ideally
>>>you get a 100% speed-up from HT because their cycles "hug" each other -- one
>>>comes out of I/O just as the other goes into I/O. You don't get 100% speed-up in
>>>most HT cases because that doesn't happen very often.
>>>
>>>I was wrong, however, as I was under the impression that Eugene had put some
>>>hand-tweaked code into the version that ran those benchmarks.
>>>
>>>-Matt
>>
>>
>>Not that I am aware of.  He said "standard crafty".  I've been plugging with the
>>pause and have it working, but it was a very minor gain.  I started to study why
>>and realized that my big "time burner" is a spinwait (not a spinlock) and it is
>>testing several volatile values per cycle.  This prevents saturating the pipe
>>with multiple iterations of the spin loop and taking the out-of-order penalty
>>that causes...  As well as not burning the cpu pipeline horribly since a lot
>>of cache traffic occurs...
>
>Can't you still get gains with pause? You could even try doubling it up. Intel
>says it's implemented as an actual delay on Xeons.
>
>-Matt


Apparently for the way I do spin-wait, it just doesn't help much.  I saw a
1-2% improvement in some cases, but not even that overall...  Of course I don't
spend a lot of time in that spin-wait either, as when crafty reports 399% cpu
utilization, that means only 1% of one cpu was wasted spinning somewhere.  I
have tried to minimize such spins, and perhaps they are simply not so important.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.