Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hyper-Threading Technology from Intel-to Hype or Not to Hype?

Author: Robert Hyatt

Date: 13:37:15 03/06/03

Go up one level in this thread


On March 06, 2003 at 11:45:19, Vincent Diepeveen wrote:

>On March 06, 2003 at 11:38:26, Vincent Diepeveen wrote:
>
>>On March 05, 2003 at 18:08:45, Robert Hyatt wrote:
>>
>>you read postings of me seemingly, but i am only owning a dual 1.6ghz K7 with a
>>slow 133Mhz bus, and because crafty is a cache eather a few guys which are at
>>150Mhz or 166Mhz bus with a dual XP, they rock of course with crafty on that
>>machine.
>>
>>See postings of several of them here at CCC. Just look around. I remember one
>>from a few weeks ago even which i noticed.
>>
>>note that a dual XP 2400 already when put to 2.1ghz (so a very little bit
>>overclocked the DDR ram, not so much the processor) is getting
>>2.2MLN nodes a second hands down.
>>
>>if you would modify your thing such that it has a 128KB hashtable in total where
>>you write in last 2 ply or so including qsearch, and in the global big hashtable
>>you write >= 2 ply depthleft, then this small hashtable would get into the L2
>>cache of your processors.
>
>==> note that this applies also to pawnhashtable. A 64KB table in total for it
>will do. Calculating the others is cheaper than lookups for it in big main ram,
>or even a 2 trep table is possible. At the supercomputer i cannot allocate a big
>global pawn table either. Too expensive. I put it in a small table that fits in
>L2 cache. Of course the difference with diep and crafty is that an evaluation in
>diep takes hundreds thousands of clocks and in crafty < 2000 clocks. So the
>latency of a RAM read you can't afford simply, yet you do try them everywhere.
>Most rotated bitboard lookups which require 1 MB L2 cache in fact, are lucky
>usually already in the L2 cache.
>
>I am sure you never measure such stuff but need tips like these then 2 years
>later it is done in crafty.

Right.  I never measure such stuff, but always seem to have profile data handy
to
show you that your "guesses" are nonsense.  I'm just lucky I guess.


>
>>This will help both K7 and P4 a lot in NPS for crafty and make it less of a RAM
>>to cache eater, the K7 (perhaps P4 too i do not know, P3 doesn't have it though)
>>can benefit a lot there because it can then use the alpha 21264 feature where it
>>can read from the L2 cache of the other guy while reading. Majority will be
>>reads of course. > 50% of the cases will be reads.
>>
>>Crafty is nowadays that fast that priority is avoiding the slow lookups to the
>>hashtable. A hashtable that fits within L2 cache is just the way to go.
>>
>>Note that for the transpositiontable it is possible to do like the pro's do and
>>put each read to it at the start of a cache line. This is possible to do in C,
>>no need for assembly.
>>
>>Of course you would never invent that yourself but it will make crafty less of a
>>RAM speed eater and deliver more IPC. In fact so much IPC you will get then that
>>the future is bright and clear for crafty then with regards to NPS.
>>
>>Getting from the current 2.1MLN a second to 3MLN a second is no problem. In fact
>>with a very tiny last ply hashtable it is possible to even get more efficient as
>>you also hash qsearch.
>>
>>A lookup to the L2 cache is very cheap compared to an evaluation and the
>>majority of transpositions is always within a couple of hundreds of thousands of
>>nodes.
>>
>>
>>
>>
>>>On March 05, 2003 at 15:25:20, Vincent Diepeveen wrote:
>>>
>>>>On March 05, 2003 at 11:34:39, Robert Hyatt wrote:
>>>>
>>>>see the posted speeds of crafty at the very cheap dual K7 XPs with faster FSB
>>>>and then compare that with your own speeds. Crafty is a lot faster on those
>>>>machines than yours. That despite crafty is a cache eater (among the
>>>>chessprograms, not among specint).
>>>
>>>I will ask again, for a number > 2.16M nodes per second with Crafty.  I have no
>>>doubt some box can produce that number, including the 3.06ghz xeons.  But to
>>>date
>>>I have not seen one.
>>>
>>>I do mean a real number, not a "if I pushed the clock to 2.3ghz it would do
>>>this..."
>>>type number...
>>>
>>>We just received our dual 3.06 boxes today.  Unfortunately they came with an
>>>onboard
>>>SCSI controller that supports hardware raid and we had to zap everything to
>>>"unraid" the
>>>disks.  Hopefully I can post some 3.06SMT results tomorrow.  These are 3.06
>>>xeons (dual)
>>>with 533mhz FSB so it will be interesting to see how they compare to my 2.8ghz
>>>with
>>>400mhz FSB.
>>>
>>>>
>>>>>On March 05, 2003 at 10:21:32, Vincent Diepeveen wrote:
>>>>>
>>>>>>On March 04, 2003 at 22:47:04, Robert Hyatt wrote:
>>>>>>
>>>>>>>On March 04, 2003 at 17:39:42, Vincent Diepeveen wrote:
>>>>>>>
>>>>>>>>On March 04, 2003 at 16:32:33, Jay-R Delacruz wrote:
>>>>>>>>
>>>>>>>>>Do the deep versions of Fritz, Junior and Shredder support hyper-thread? Can
>>>>>>>>>someone please tell me before upgrading my PC to try the deep versions?
>>>>>>>>
>>>>>>>>I just read email from Frans Morsch. DeepFritz7 gets 5-10% speedup by
>>>>>>>>hyperthreading.
>>>>>>>>
>>>>>>>>Shredder gets more speedup in nodes a second than that, but it gets no speedup
>>>>>>>>from it as it gets SMP already a far smaller speedup (1.5 or so), so it is
>>>>>>>>smarter to turn SMT/HT off for it. perhaps shredder8 will fix this.
>>>>>>>>
>>>>>>>>For diep it speeds me up about 11% in NPS but i cannot garantuee that at a 4
>>>>>>>>processor it will give a positive speedup.
>>>>>>>>
>>>>>>>>When running 2 processes at a P4 at 3.06ghz it will give for sure some speedup
>>>>>>>>because it goes from 100k nps to 120k nps. Nearly 20% speedup it gets with it
>>>>>>>>(18.6 or something) which gives a positive speedup also in depth.
>>>>>>>>
>>>>>>>>For deepjunior we know that it already works bad at 8 processor Xeon 1.6Ghz
>>>>>>>>versus 4 processor Xeon 1.9Ghz, so i *assume* for now that SMT/HT will not give
>>>>>>>>it much benefit for it at all, but perhaps Amir or Shay wants to give a
>>>>>>>>statement regarding this themselves.
>>>>>>>>
>>>>>>>>We talk of course about the SMT/HT from Xeon processors up to 2.8Ghz now for
>>>>>>>>those which have it enabled. For the P4 3.06Ghz and also Xeons of that and above
>>>>>>>>things are a different matter.
>>>>>>>
>>>>>>>You keep saying that.  It continues to be _wrong_.  The 2.8 xeon has the
>>>>>>>_exact_ same cpu core (and SMT) that the 3.06 xeon and PIV has.  And when I
>>>>>>>say _exactly_ I mean _exactly_.  This is _directly_ from Intel... for the
>>>>>>>record.
>>>>>>
>>>>>>try some better source instead of the marketing department try some hardware
>>>>>>experts. for example at: http://www.realworldtech.com/index.cfm
>>>>>
>>>>>
>>>>>I don't use "marketing types".  And I can send you some dual 3.06 xeon test
>>>>>results that
>>>>>mirror my dual 2.8's _exactly_ in terms of the 20% to 30% raw NPS figures.  I
>>>>>sent my
>>>>>"worst case positions" to someone with one of these machines and he got the
>>>>>_same_
>>>>>20% improvement at 3.06 that I got with my 2.8's.
>>>>>
>>>>>Your data is simply wrong.  The xeon core has _not_ changed from 2.8ghz to 3.06
>>>>>ghz,
>>>>>and I have no idea why you want to supply your "disinformation" that it has.
>>>>>
>>>>>We have four of these on the way (dual 3.06 dell 650s) for faculty.  They have
>>>>>shipped
>>>>>(2/28) so they should be here any time.  I'll run the tests and post the results
>>>>>to further
>>>>>debunk this "myth" that 3.06's are different...
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>Best regards,
>>>>>>>>Vincent



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.