Author: Robert Hyatt
Date: 06:06:19 04/15/04
Go up one level in this thread
On April 15, 2004 at 05:57:31, Joachim Rang wrote: >On April 14, 2004 at 22:49:39, Robert Hyatt wrote: > >>I just finished some HT on / HT off tests to see how things have changed in >>Crafty since some of the recent NUMA-related memory changes that were made. >> >>Point 1. HT now speeds Crafty up between 5 and 10% max. A year ago this was >>30%. What did I learn? Nothing new. Memory waits benefit HT. Eugene and I >>worked on removing several shared memory interactions which led to better cache >>utilization, less cache invalidates (very slow) and improved performance a good >>bit. But at the same time, now HT doesn't have the excessive memory waits it >>had before and so the speedup is not as good. >> >>Point 2. HT now actually slows things down due to SMP overhead. IE I lose 30% >>per CPU, roughly, due to SMP overhead. HT now only gives 5-10% back. This is a >>net loss. I am now running my dual with HT disabled... >> >>More as I get more data... Here is two data points however: >> >>pos1. cpus=2 (no HT) NPS = 2.07M time=18.13 >> cpus=4 NPS = 2.08M time=28.76 >> >>pos2. cpus=2 NPS = 1.87M time=58.48 >> cpus=4 NPS = 2.01M time=66.00 >> >>First pos HT helps almost none in NPS, costs 10 seconds in search overhead. >>Ugly. Position 2 gives about 5% more nps, but again the SMP overhead washes >>that out and there is a net loss. I should run the speedup tests several times, >>but the NPS numbers don't change much, and the speedup could change. But this >>offers enough.. > > >thanks Bob for sharing that information. Does that mean a poorly optimized >engien should gain more from HT, perhaps even to the point that it pays off? > >Can you give an estimate of crafty on a quad- and 8-way-system in gain of nps >and search time? > >thanks in advance > >regards Joachim Think about this: In an operating system, we gain by running two processes at the same time only when both do some I/O so that there is overlap. No I/O and things run no faster. Ditto for SMT. The more one thread blocks waiting on memory, the more advantage you see with SMT since a second thread can eat the CPU cycles that would normally be lost waiting on memory loads/stores. on an 8-way the NPS is about 8x. This on an 8-way opteron. Speedup numbers were as I have posted in the past. IE about 6x or a little less...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.