Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hyper-threading at dual xeon 2.8Ghz

Author: Vincent Diepeveen

Date: 06:19:47 02/25/03

Go up one level in this thread


On February 25, 2003 at 08:56:27, Robert Hyatt wrote:

Bob it is 10ms latency. period.

>On February 25, 2003 at 07:44:23, Vincent Diepeveen wrote:
>
>>On February 23, 2003 at 01:38:55, Matt Taylor wrote:
>>
>>DIEP is spinning and locking way way less than Crafty. Note that
>>it is pretty hard to do without spinning under linux.
>
>1.  It is not "hard to do" under linux.  Default pthread_lock() doesn't spin,
>the
>process blocks.  But that is inefficient if the lock is only held for a few
>instructions.
>
>2.  My lock overhead is not very significant.  From actual measurements rather
>than guesswork.
>
>>
>>The runqueue fires at 100Hz in linux. So the latency for a thread that doesn't
>>search and normally is doing all kind of stuff is around 10ms under linux.
>
>That is wrong.  the run-queue "fires" whenever a process releases a lock that
>another
>process is waiting on, if there is an idle processor.
>
>>
>>For crafty 10ms latency is too much to wait for a thread to get fired for sure.
>
>
>Yes, but there is no 10ms latency.
>
>>
>>I guess you didn't try to figure out what the cost of it is, otherwise you would
>>not write such unprofessional comments like below.
>
>
>Right.  I guess you haven't tested _anything_ or you wouldn't write such
>nonsense
>as above???
>
>
>>
>>In DIEP under linux i do not idle either. Of course for me 10ms is too expensive
>>too. Instead i generate a bunch of attacktables instead an idle process doesn't
>>hammer at the same cache line like crafty does.
>
>hammering the same cache line is _very_ efficient, sorry, that is the point for
>a
>"shadow lock" in fact.
>
>
>
>
>>
>>It speeds DIEP up 20% (in nodes a second) at 32 processors when i do not take
>>the 10ms penalty but go for doing something with the registers without hurting
>>shared cache lines (so just local allocated stuff).
>
>There is no 10ms penalty in linux, so I have absolutely no idea what you are
>talking about.  If there is an idle processor unblocks, that processor starts to
>work _immediately_ not after 10ms.  Where you got that I have no idea.
>
>
>>
>>Under windows the runqueue fires at 500Hz, so that's 2ms latency. Still a lot,
>>but a lot less than 10ms latency. Today i go test what the effect of that is for
>>DIEP. I have no dual Xeon to my avail at the moment to test it though. Must do
>>with a dual K7 and dual P3 and see what generating 600 attacktables (about 0.5
>>ms at the dual k7) just in local ram is going to give versus using
>>WaitForSingleObject.
>>
>>So for processes that let threads idle instead of letting them spin, that is a
>>complete pathetic idea for realtime environments.
>
>
>And of course you didn't answer the question:  "did you modify your spinlocks
>and spinwaits" to use the pause instruction so that hyper-threading works
>efficiently when one of the two logical cpus is spinning?"
>
>I know it is "unprofessional" to ask a technically precise question that is
>important
>to the thread being discussed.  But I guess I couldn't help myself.  After all I
>thought
>that there should be _some_ technical merit in a thread you post in.
>
>The spinwait/spinlock problem is well-known.  It's been discussed in a paper on
>the
>Intel web site.  All you had to do was read it, or follow the discussions here,
>or look at
>my spinlock code, to see what the problem is, and how to fix it...
>
>
>
>>
>>>On February 23, 2003 at 00:39:34, Robert Hyatt wrote:
>>>
>>>>On February 22, 2003 at 02:54:21, Vincent Diepeveen wrote:
>>>>
>>>>>On February 21, 2003 at 19:49:04, David Weber wrote:
>>>>>
>>>>>>what chess programs support hyper-threading
>>>>>
>>>>>DIEP, Crafty, Fritz.
>>>>>
>>>>>for fritz it speeds up 10% node count at 4 threads at a dual Xeon 2.8Ghz
>>>>>(compared to HT turned off and 2 threads), but chessbase didn't test yet whether
>>>>>it actually speeds up search depth (according to Mathias who operates fritz
>>>>>here).
>>>>>for shredder it does speed up the node counts but not search depth
>>>>>so it has SMT/HT turned off here at this tournament and runs with 2 threads at a
>>>>>dual Xeon 2.8Ghz here.
>>>>
>>>>
>>>>Did you make the necessary changes to spinlocks and spinwaits???
>>>
>>>Sorry, can't resist a good laugh!
>>>
>>>"No, they're not out yet!"
>>>
>>>:-)
>>>
>>>-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.