Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Try RC5 w/ HT

Author: Robert Hyatt

Date: 19:58:51 05/23/03

Go up one level in this thread


On May 22, 2003 at 23:29:25, Aaron Gordon wrote:

>On May 22, 2003 at 22:24:29, Robert Hyatt wrote:
>
>>On May 22, 2003 at 13:43:55, Tom Kerrigan wrote:
>>
>>>On May 21, 2003 at 22:20:57, Robert Hyatt wrote:
>>>
>>>>On May 21, 2003 at 15:48:46, Tom Kerrigan wrote:
>>>>
>>>>>On May 21, 2003 at 13:46:26, Robert Hyatt wrote:
>>>>>
>>>>>>On May 20, 2003 at 13:52:01, Tom Kerrigan wrote:
>>>>>>
>>>>>>>On May 20, 2003 at 00:26:49, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>Actually it _does_ surprise me.  The basic idea is that HT provides improved
>>>>>>>>resource utilization within the CPU.  IE would you prefer to have a dual 600mhz
>>>>>>>>or a single 1000mhz machine?  I'd generally prefer the dual 600, although for
>>>>>>>
>>>>>>>You're oversimplifying HT. When HT is running two threads, each thread only gets
>>>>>>>half of the core's resources. So instead of your 1GHz vs. dual 600MHz situation,
>>>>>>>what you have is more like a 1GHz Pentium 4 vs. a dual 1GHz Pentium. The dual
>>>>>>>will usually be faster, but in many cases it will be slower, sometimes by a wide
>>>>>>>margin.
>>>>>>
>>>>>>Not quite.  Otherwise how do you explain my NPS _increase_ when using a second
>>>>>>thread on a single physical cpu?
>>>>>>
>>>>>>The issue is that now things can be overlapped and more of the CPU core
>>>>>>gets utilized for a greater percent of the total run-time...
>>>>>>
>>>>>>If it were just 50-50 then there would be _zero_ improvement for perfect
>>>>>>algorithms, and a negative improvement for any algorithm with any overhead
>>>>>>whatsoever...
>>>>>>
>>>>>>And the 50-50 doesn't even hold true for all cases, as my test results have
>>>>>>shown, even though I have yet to find any reason for what is going on...
>>>>>
>>>>>Think a little bit before posting, Bob. I said that the chip's execution
>>>>>resources were evenly split, I didn't say that the chip's performance is evently
>>>>>split. That's just stupid. You have to figure in how those execution resources
>>>>>are utilized and understand that adding more of these resources gives you
>>>>>diminishing returns.
>>>>>
>>>>>-Tom
>>>>
>>>>
>>>>You shold follow your own advice.  If resources are split "50-50" then how
>>>>can _my_ program produce a 70-30 split on occasion?
>>>>
>>>>It simply is _not_ possible.
>>>>
>>>>There is more to this than a simple explanation offers...
>>>
>>>Now you're getting off onto another topic here.
>>>
>>
>>Read backward.  _I_ did not "change the topic".
>>
>>I said that I don't see how it is possible for HT to slow a program down.
>>
>>You said "50-50" resource allocation might be an explanation.
>>
>>I said "that doesn't seem plausible because I have at least one example of
>>two compute-bound threads that don't show a 50-50 balance on SMT."
>>
>>If Eugene is right, and I don't know as he was not sure and I haven't read
>>anything similar to what he mentioned, that _could_ explain it (ie if some
>>resources are split 50-50 between the two logical processors even if one
>>could use more than the other due to the particular application being run.
>>However that seems like a _bad_ design decision if it is true...)  However
>>there are probably other plausible explanations as well.  What is the _real_
>>explanation?  That will likely take some time to figure out.
>>
>>
>>>Originally you were saying that it's impossible for HT to slow a program down
>>>unless there was something wrong with the algorithm.
>>
>>And based on testing here, I pretty well stick with that.  I won't say there
>>is _no_ program that will run slower, but I haven't found one myself.  And
>>again, to be clear, we are talking about one program, one thread.  Run on
>>a machine with SMT on and SMT off.  I've run that test repeatedly and can't
>>find any penalty for one thread when turning SMT on.  ANd I do mean _no
>>penalty_ on anything I have tried.  Kernel builds.  Compiles.  Running
>>Crafty.  Running various compute-bound applications like NAMD, a big monte-carlo
>>simulation, etc...
>>
>>The idea really doesn't make sense, IMHO.
>>
>>
>>>
>>>Now you're back to complaining about your 70-30 split, which is only related to
>>>the original topic because they both involve ratios like "50-50" and "70-30."
>>
>>That 70-30 was used simply to suggest that 50-50 is _not_ a "golden rule" in
>>SMT resource allocation, apparently.  Nothing more.
>>
>>
>>
>>
>>>
>>>-Tom
>
>
>Hyatt, grab distributed.net's RC5-72 client, it supports multiple cpus and with
>every dual system I've seen it run it on gets an exact 100% increase in
>nodes/second. Now, it only spawns 1 thread per processor & isn't memory
>intensive what so ever (that i've seen, only CPU clock speed affects results). A
>P4 with HT gets HALF the speed of a P4 w/o HT in some of the results I've seen,
>if you get the time try to verify that for me. I would have figured this would
>have been one of the programs HT would shine at. Complete surprise to me...  If
>you could, grab the linux RC5-72 client at:

What are they measuring?

IE running two copies _should_ see each copy run about 1/2 as fast with SMT
on, since each copy is getting roughly 50% of available cpu core resources
when running the same instruction streams.

Or do you mean something else?


>
>ftp://ftp.distributed.net/pub/dcti/current-client/dnetc-linux-x86-elf.tar.gz
>
>For those of you interested in running it in windows, here is the windows bin:
>ftp://ftp.distributed.net/pub/dcti/current-client/dnetc-win32-x86.zip
>
>To run the benchmark all you need to do is type, ./dnetc -benchmark
>This only uses one processor, you can configure it to display nodes & keys/sec
>as the "live rate". This will use all of the processors (automatically), here
>are the config files to test rc5 and OGR.
>
>dnetc.ini for RC5-72 showing the 'live rate'
>[parameters]
>id=test@test.com
>
>[misc]
>project-priority=RC5-72,OGR=0
>
>[display]
>progress-indicator=rate
>
>
>dnetc.ini for OGR-25 showing the 'live rate'
>id=test@test.com
>
>[misc]
>project-priority=OGR,RC5-72=0
>
>[display]
>progress-indicator=rate
>
>
>From what I understand RC5/OGR uses mostly shifting, and from what I've seen the
>P4 is extremely slow at that and HT may further hinder shifting. Just a guess
>anyway. If you'd like some results to compare to, here is some of my Win2k
>(slightly slower than the linux binary under redhat9) results..
>
>
>[Apr 03 04:35:29 UTC] RC5-72: Benchmark for core #5 (SS 2-pipe)
>                      0.00:00:17.79 [8,235,738 keys/sec]
>
>[Apr 03 04:36:20 UTC] OGR: Benchmark for core #0 (GARSP 5.13-A)
>                      0.00:00:16.98 [19,330,517 nodes/sec]
>
>This is a single Athlon XP at 2507MHz..



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.