Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: hyper-threading at dual xeon 2.8Ghz

Author: Robert Hyatt
Date: 12:48:40 02/26/03
On February 25, 2003 at 09:35:56, Vincent Diepeveen wrote:

>On February 25, 2003 at 08:56:27, Robert Hyatt wrote:
>
>i wonder what kind of a person you are. i even give you a 20% speedup figure
>which i measured at 32 processors.
\


20% for what program on what hardware?  Crafty on NUMA?  So what?  It isn't
designed
for NUMA.  But the locks are _not_ the problem on NUMA for crafty, what is in
memory
and where that memory is, _is_ important.



>
>the global concept of crafty to lock everything global is *wrong* of course. a
>wrong concept for SMT/HT, which i am not using at all. In fact you copy like 2KB
>up to 44KB or so to split, versus diep only a move list. Also muchb etter for
>HT/SMT.
>

Totally irrelevant for SMT.  But since you don't seem to understand it, have
been claiming
that it doesn't work, and even claimed that no XEON supports it, and that no 2.8
xeons are
even available, I suppose such nonsense is to be expected...

The global lock isn't a problem because it isn't done very frequently.  As I
have said, it
is typically done less than 1000 times per 3 minute move, and occasionally jumps
to 2000-3000
splits per move.  Even if a split took 10ms, which it doesn't, that is not a
_huge_ loss of time.

You spend too much time talking about things you don't understand.  You should
spend more
time trying to speed up the things that hurt performance, not the things that
are lost in the
noise.




>Let's quote a statement from a professional computerchessprogrammer (i didn't do
>it but i subscribe to it otherwise i would not quote it, the statement was
>subscribed to by majority of programmers attending IPCCC2003): "the problem of
>crafty is that it first needs to get a lot more efficient single cpu before we
>can go discuss other things such as parallel speedup numbers".


What does that mean?  That my program has a "inefficient" search?  Seems to be
doing
just fine for me.  Seems to be doing just fine against yours on ICC.  Perhaps
you should be
making _yours_ more efficient?  But that seems to be your way of "operating".
What you
have is no good, so you would rather spend time criticizing others, rather than
spending
time trying to fix _your_ problems.

My search does just fine, whether it be on one processor or on four.  Your
stupid comments
on ICC about "Wonder what kind of dubious pruning bob is doing to get 14 plies
here" is
just garbage, as the _only_ pruning in Crafty is null-move, as I have told you a
dozen time.

So always "put the blame on the other program, even if it stomps yours in
games."  That is
_one_ way to carry on...






>
>Best regards,
>Vincent
>
>>On February 25, 2003 at 07:44:23, Vincent Diepeveen wrote:
>>
>>>On February 23, 2003 at 01:38:55, Matt Taylor wrote:
>>>
>>>DIEP is spinning and locking way way less than Crafty. Note that
>>>it is pretty hard to do without spinning under linux.
>>
>>1.  It is not "hard to do" under linux.  Default pthread_lock() doesn't spin,
>>the
>>process blocks.  But that is inefficient if the lock is only held for a few
>>instructions.
>>
>>2.  My lock overhead is not very significant.  From actual measurements rather
>>than guesswork.
>>
>>>
>>>The runqueue fires at 100Hz in linux. So the latency for a thread that doesn't
>>>search and normally is doing all kind of stuff is around 10ms under linux.
>>
>>That is wrong.  the run-queue "fires" whenever a process releases a lock that
>>another
>>process is waiting on, if there is an idle processor.
>>
>>>
>>>For crafty 10ms latency is too much to wait for a thread to get fired for sure.
>>
>>
>>Yes, but there is no 10ms latency.
>>
>>>
>>>I guess you didn't try to figure out what the cost of it is, otherwise you would
>>>not write such unprofessional comments like below.
>>
>>
>>Right.  I guess you haven't tested _anything_ or you wouldn't write such
>>nonsense
>>as above???
>>
>>
>>>
>>>In DIEP under linux i do not idle either. Of course for me 10ms is too expensive
>>>too. Instead i generate a bunch of attacktables instead an idle process doesn't
>>>hammer at the same cache line like crafty does.
>>
>>hammering the same cache line is _very_ efficient, sorry, that is the point for
>>a
>>"shadow lock" in fact.
>>
>>
>>
>>
>>>
>>>It speeds DIEP up 20% (in nodes a second) at 32 processors when i do not take
>>>the 10ms penalty but go for doing something with the registers without hurting
>>>shared cache lines (so just local allocated stuff).
>>
>>There is no 10ms penalty in linux, so I have absolutely no idea what you are
>>talking about.  If there is an idle processor unblocks, that processor starts to
>>work _immediately_ not after 10ms.  Where you got that I have no idea.
>>
>>
>>>
>>>Under windows the runqueue fires at 500Hz, so that's 2ms latency. Still a lot,
>>>but a lot less than 10ms latency. Today i go test what the effect of that is for
>>>DIEP. I have no dual Xeon to my avail at the moment to test it though. Must do
>>>with a dual K7 and dual P3 and see what generating 600 attacktables (about 0.5
>>>ms at the dual k7) just in local ram is going to give versus using
>>>WaitForSingleObject.
>>>
>>>So for processes that let threads idle instead of letting them spin, that is a
>>>complete pathetic idea for realtime environments.
>>
>>
>>And of course you didn't answer the question:  "did you modify your spinlocks
>>and spinwaits" to use the pause instruction so that hyper-threading works
>>efficiently when one of the two logical cpus is spinning?"
>>
>>I know it is "unprofessional" to ask a technically precise question that is
>>important
>>to the thread being discussed.  But I guess I couldn't help myself.  After all I
>>thought
>>that there should be _some_ technical merit in a thread you post in.
>>
>>The spinwait/spinlock problem is well-known.  It's been discussed in a paper on
>>the
>>Intel web site.  All you had to do was read it, or follow the discussions here,
>>or look at
>>my spinlock code, to see what the problem is, and how to fix it...
>>
>>
>>
>>>
>>>>On February 23, 2003 at 00:39:34, Robert Hyatt wrote:
>>>>
>>>>>On February 22, 2003 at 02:54:21, Vincent Diepeveen wrote:
>>>>>
>>>>>>On February 21, 2003 at 19:49:04, David Weber wrote:
>>>>>>
>>>>>>>what chess programs support hyper-threading
>>>>>>
>>>>>>DIEP, Crafty, Fritz.
>>>>>>
>>>>>>for fritz it speeds up 10% node count at 4 threads at a dual Xeon 2.8Ghz
>>>>>>(compared to HT turned off and 2 threads), but chessbase didn't test yet whether
>>>>>>it actually speeds up search depth (according to Mathias who operates fritz
>>>>>>here).
>>>>>>for shredder it does speed up the node counts but not search depth
>>>>>>so it has SMT/HT turned off here at this tournament and runs with 2 threads at a
>>>>>>dual Xeon 2.8Ghz here.
>>>>>
>>>>>
>>>>>Did you make the necessary changes to spinlocks and spinwaits???
>>>>
>>>>Sorry, can't resist a good laugh!
>>>>
>>>>"No, they're not out yet!"
>>>>
>>>>:-)
>>>>
>>>>-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.