Author: Robert Hyatt
Date: 20:30:20 12/09/02
Go up one level in this thread
On December 09, 2002 at 02:09:03, Matt Taylor wrote: >On December 08, 2002 at 23:58:36, Robert Hyatt wrote: > >>On December 08, 2002 at 22:14:44, Jeremiah Penery wrote: >> >>>On December 08, 2002 at 21:24:52, Robert Hyatt wrote: >>> >>>>I also think that the numbers for the HT-enabled stuff might be wrong. The >>>>node counts should vary, because to use HT, it is necessary to run the SMP >>>>version of crafty and use mt=2 to use both logical processors. That will >>>>produce variability in the total nodes searched that I didn't see in any of >>>>the numbers displayed... >>> >>>All of Aaron's numbers were single-threaded Crafties. But Steffen posted a >>>result with HT that used LESS total nodes than the single-threaded program, but >>>only had a 13% speedup in terms of raw NPS. >> >> >>OK. The only data I have to go on at the moment is my dual 2.8 xeon with >>HT. I will try to run the benchmark tomorrow and post results. I have to be >>at the office to do this as the only way I can turn SMT off is to reboot, >>go into the BIOS setup and turn "logical cpu" off (Dell's syntax, not mine). >> >>I got a clean 33% faster in NPS. I didn't check the actual search times >>very carefully but I will try to do so. >> >>I can run 1/2 cpus with SMT off, and 3/4 with SMT on, and compare both NPS >>and raw time-to-solution times... > >Please post detailed results from as many permutations as you can. I'm >interested in seeing how it changes. > >-Matt raw data: I ran four test positions (the last four from the kopec test set for no good reason other than they were easy to grab for the test. I ran the four positions using the normal crafty (no pause in the spinlock or spinwait code, which is less efficient than what it could be)... I will give raw nodes searched per second to get a raw estimate on hardware speed (which ignores parallel search efficiency which is not a hardware issue here). I ran the tests with 1 cpu and 2 cpus, SMT disabled, then enabled SMT and ran it for 3 and 4 threads. Here is the results: one cpu: 1. 1217K nps 2. 956K nps 3. 937K nps 4. 887K nps two cpus: 1. 1889K nps 1.55X 2. 1555K nps 1.98X 3. 1530K nps 1.63X 4. 1444K nps 1.62X three cpus (SMT on): 1. 2151K nps 1.76X 2. 1780K nps 1.86X 3. 1730K nps 1.84X 4. 1620K nps 1.83X four cpus (SMT on): 1. 2275K nps 1.86X 2. 1888K nps 1.97X 3. 1846K nps 1.97X 4. 1683K nps 1.90X Those are a first cut. I am pretty sure that the Pause() fix will add some improvement to the parallel NPS, particularly for the last two with STM turned on...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.