Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Questions re P4 3.03 with HT ??

Author: Robert Hyatt

Date: 20:30:20 12/09/02

Go up one level in this thread


On December 09, 2002 at 02:09:03, Matt Taylor wrote:

>On December 08, 2002 at 23:58:36, Robert Hyatt wrote:
>
>>On December 08, 2002 at 22:14:44, Jeremiah Penery wrote:
>>
>>>On December 08, 2002 at 21:24:52, Robert Hyatt wrote:
>>>
>>>>I also think that the numbers for the HT-enabled stuff might be wrong.  The
>>>>node counts should vary, because to use HT, it is necessary to run the SMP
>>>>version of crafty and use mt=2 to use both logical processors.  That will
>>>>produce variability in the total nodes searched that I didn't see in any of
>>>>the numbers displayed...
>>>
>>>All of Aaron's numbers were single-threaded Crafties.  But Steffen posted a
>>>result with HT that used LESS total nodes than the single-threaded program, but
>>>only had a 13% speedup in terms of raw NPS.
>>
>>
>>OK.  The only data I have to go on at the moment is my dual 2.8 xeon with
>>HT.  I will try to run the benchmark tomorrow and post results.  I have to be
>>at the office to do this as the only way I can turn SMT off is to reboot,
>>go into the BIOS setup and turn "logical cpu" off (Dell's syntax, not mine).
>>
>>I got a clean 33% faster in NPS.  I didn't check the actual search times
>>very carefully but I will try to do so.
>>
>>I can run 1/2 cpus with SMT off, and 3/4 with SMT on, and compare both NPS
>>and raw time-to-solution times...
>
>Please post detailed results from as many permutations as you can. I'm
>interested in seeing how it changes.
>
>-Matt


raw data:

I ran four test positions (the last four from the kopec test set for no good
reason other than they were easy to grab for the test.  I ran the four positions
using the normal crafty (no pause in the spinlock or spinwait code, which is
less efficient than what it could be)...

I will give raw nodes searched per second to get a raw estimate on hardware
speed (which ignores parallel search efficiency which is not a hardware
issue here). I ran the tests with 1 cpu and 2 cpus, SMT disabled, then enabled
SMT and ran it for 3 and 4 threads.  Here is the results:

one cpu:

1.    1217K nps
2.     956K nps
3.     937K nps
4.     887K nps

two cpus:

1.    1889K nps     1.55X
2.    1555K nps     1.98X
3.    1530K nps     1.63X
4.    1444K nps     1.62X

three cpus (SMT on):

1.    2151K nps     1.76X
2.    1780K nps     1.86X
3.    1730K nps     1.84X
4.    1620K nps     1.83X

four cpus (SMT on):

1.    2275K nps     1.86X
2.    1888K nps     1.97X
3.    1846K nps     1.97X
4.    1683K nps     1.90X

Those are a first cut.  I am pretty sure that the Pause() fix will add some
improvement to the parallel NPS, particularly for the last two with STM turned
on...




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.