Author: Robert Hyatt
Date: 19:50:24 04/16/04
Go up one level in this thread
On April 16, 2004 at 14:22:13, Vincent Diepeveen wrote: >On April 16, 2004 at 12:40:21, Robert Hyatt wrote: > >>On April 16, 2004 at 10:05:04, Vincent Diepeveen wrote: >> >>>On April 15, 2004 at 13:10:26, Robert Hyatt wrote: >>> >>>>On April 15, 2004 at 12:45:23, Vincent Diepeveen wrote: >>>> >>>>>On April 15, 2004 at 09:01:44, Robert Hyatt wrote: >>>>> >>>>>>On April 15, 2004 at 06:05:15, Joachim Rang wrote: >>>>>> >>>>>>>On April 14, 2004 at 22:49:39, Robert Hyatt wrote: >>>>>>> >>>>>>>>I just finished some HT on / HT off tests to see how things have changed in >>>>>>>>Crafty since some of the recent NUMA-related memory changes that were made. >>>>>>>> >>>>>>>>Point 1. HT now speeds Crafty up between 5 and 10% max. A year ago this was >>>>>>>>30%. What did I learn? Nothing new. Memory waits benefit HT. Eugene and I >>>>>>>>worked on removing several shared memory interactions which led to better cache >>>>>>>>utilization, less cache invalidates (very slow) and improved performance a good >>>>>>>>bit. But at the same time, now HT doesn't have the excessive memory waits it >>>>>>>>had before and so the speedup is not as good. >>>>>>>> >>>>>>>>Point 2. HT now actually slows things down due to SMP overhead. IE I lose 30% >>>>>>>>per CPU, roughly, due to SMP overhead. HT now only gives 5-10% back. This is a >>>>>>>>net loss. I am now running my dual with HT disabled... >>>>>>>> >>>>>>>>More as I get more data... Here is two data points however: >>>>>>>> >>>>>>>>pos1. cpus=2 (no HT) NPS = 2.07M time=18.13 >>>>>>>> cpus=4 NPS = 2.08M time=28.76 >>>>>>>> >>>>>>>>pos2. cpus=2 NPS = 1.87M time=58.48 >>>>>>>> cpus=4 NPS = 2.01M time=66.00 >>>>>>>> >>>>>>>>First pos HT helps almost none in NPS, costs 10 seconds in search overhead. >>>>>>>>Ugly. Position 2 gives about 5% more nps, but again the SMP overhead washes >>>>>>>>that out and there is a net loss. I should run the speedup tests several times, >>>>>>>>but the NPS numbers don't change much, and the speedup could change. But this >>>>>>>>offers enough.. >>>>>>> >>>>>>> >>>>>>>In a german Board someone postetd figures for the Fritzmark of Fritz 8. Fritz >>>>>>>gains still 25% form HT (in this specific position) >>>>>>> >>>>>>>cpus=2 NPS = 2.35 >>>>>>>cpus=4 NPS = 2,95 >>>>>>> >>>>>>>I have unfortunately no information about search time. >>>>>>> >>>>>>>Does that mean Fritz 8 is poorly optimized? >>>>>>> >>>>>>>regards Joachim >>>>>> >>>>>> >>>>>>It means it has some cache issues that can be fixed to speed it up further, yes. >>>>> >>>>>Not at all. >>>>> >>>>>Fritz is p4 hand optimized assembly currently. I expect him to work hard on an >>>>>opteron hand optimized assembly version from fritz now (probably already 1 year >>>>>working at it by now). >>>> >>>>Sorry, but you should stick to topics you know something about. SMT works best >>> >>>I guess this is your way of saying: "sorry i did not consider that it was a more >>>efficient program than crafty, and that the better SMT was caused by more hash >>>lookups than that i had taken into account could be profittable". >>> >>>>in programs where there are memory reads/writes that stall a thread. As you >>>>work out those stalls, SMT pays off less gain. My current numbers clearly show >>>>this as opposed to the numbers I (and others) posted when I first got my SMT >>>>box... >>> >>>You do 1 lookup to RAM. He's doing perhaps 3 lookups. >>> >>>You should do your math better before commenting on Fritz being inefficient >>>programmed. >> >>Why don't you quote _exactly_ where I said that. >> >>Then we can start the _real_ conversation. >> >>Hint: >> >>I said "Fritz has some cache issues." That is _all_ I said. Your hyperbole >>turned that into "inefficiently programmed" as your hyperbole always changes >>everybody's statements... > >You implicitly for everybody clear suggest that it has fixable problems, just >like you 'fixed' them in crafty. So? I've written my fair share of asm stuff over the years. _each_ new architectural change introduces new performance considerations. Doesn't mean the old program was "inefficiently programmed." Just means it is not yet optimized for the _new_ platform... > >This where it is written in assembly for a P4 with 512KB cache and you had not >even thought of it that it might do more hashtable lookups than your crafty. Again, so? What does random information have to do with anything here?>??
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.