Author: Eugene Nalimov
Date: 22:07:26 11/08/03
Go up one level in this thread
On November 08, 2003 at 21:40:20, Robert Hyatt wrote: >On November 08, 2003 at 14:55:15, Robert Hyatt wrote: > >>On November 05, 2003 at 18:57:08, Eugene Nalimov wrote: >> >>>On November 05, 2003 at 18:17:03, Robert Hyatt wrote: >>> >>>>On November 05, 2003 at 16:41:51, Eugene Nalimov wrote: >>>> >>>>>On November 05, 2003 at 09:54:13, Robert Hyatt wrote: >>>>> >>>>>>On November 05, 2003 at 05:22:16, Ed Schröder wrote: >>>>>> >>>>>>>If you the choice between: >>>>>>> >>>>>>>1) AMD Opteron 244, 1.8 Ghz, S-940 Box >>>>>>> >>>>>>>and: >>>>>>> >>>>>>>2) AMD MP 2600+, 266Mhz >>>>>>> >>>>>>>then what would be the best choice regarding speed. >>>>>>> >>>>>>>I wonder... >>>>>>> >>>>>>>Ed >>>>>> >>>>>>for me, I'd take the opteron. >>>>>> >>>>>>Crafty gets about 2M nps on a 1.8ghz opteron... single processor. >>>>> >>>>>Not exactly. Following are 2 log files from (new version of) Crafty running on >>>>>1.8GHz quad Opteron system. Run time vary from run to run, but those are typical >>>>>ones >>>>> >>>>>1 CPU: 1,762knps >>>>>4 CPUs: 6,856knps >>>> >>>>OK... I had done the calculation wrong. I thought that 6.8M for 4 was >>>>basically 3.2X faster than 1, due to the NUMA scaling issues. It looks >>>>from the above that it is now scaling almost 4:1 which is great. :) >>>> >>>>Now if my dual xeon would just scale 2.0 :) >>> >>>What is current number? I believe we improved it when you made some global >>>per-thread one, no? >>> >>>Thanks, >>>Eugene >> >> >>Looks better (I just tested.) Seems to be back to the magic >>1.9X (raw NPS is 1.9X faster with two processors than with >>1. >> >>Here's the raw data. >> >>one cpu: >> >> time=1:25 cpu=99% mat=0 n=85541805 fh=91% nps=998k >> time=55.41 cpu=99% mat=0 n=62193826 fh=95% nps=1122k >> time=1:40 cpu=99% mat=-1 n=89355667 fh=94% nps=886k >> time=1:18 cpu=99% mat=0 n=82339318 fh=92% nps=1050k >> >>two cpus (SMT off): >> time=49.12 cpu=198% mat=0 n=91626204 fh=91% nps=1865k >> time=27.55 cpu=198% mat=0 n=58868942 fh=95% nps=2136k >> time=1:00 cpu=198% mat=-1 n=101092946 fh=94% nps=1669k >> time=45.56 cpu=197% mat=0 n=89351627 fh=92% nps=1961k >> >>four cpus (SMT on): >> time=50.32 cpu=392% mat=0 n=105665041 fh=91% nps=2099k >> time=23.92 cpu=388% mat=0 n=57409674 fh=95% nps=2400k >> time=57.60 cpu=392% mat=-1 n=108568676 fh=93% nps=1884k >> time=40.88 cpu=396% mat=0 n=91017384 fh=92% nps=2226k > > >I didn't have time to analyze the data above, but I notice that since I have >been doing the NUMA-specific fixes, which also have to do with cache coherency >issues, my SMT performance is no longer what it was a while back. IE from >the raw NPS numbers, it seems to be about 10% faster now with SMT on than off. >Probably explained by the less frequent cache line loading for a specific shared >variable that was causing problems earlier... SMT on is still faster with a >parallel search, for me, but the difference is not as stark as it was 6 months >ago when this topic came up initially... I hope that your SMT nps didn't worsen, right? Just your non-SMT nps went up? If so, explanation is simple -- you have less cache conflicts, so your thread ususally not blocked, so there is less "idle" resources to be utilized by another thread. The best SMT numbers I observed were achieved either on program with lot of unpredictable branches (e.g. (de)compressor, where with good algorith branches are unpredictable -- otherwise there would be some regularity, that can be used to obtain better compression ratio), or with server-like code with *lot* of cache misses (and unpredictable branches as well). Thanks, Eugene
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.