Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hyper-Threading Technology from Intel-to Hype or Not to Hype?

Author: Vincent Diepeveen

Date: 08:45:19 03/06/03

Go up one level in this thread


On March 06, 2003 at 11:38:26, Vincent Diepeveen wrote:

>On March 05, 2003 at 18:08:45, Robert Hyatt wrote:
>
>you read postings of me seemingly, but i am only owning a dual 1.6ghz K7 with a
>slow 133Mhz bus, and because crafty is a cache eather a few guys which are at
>150Mhz or 166Mhz bus with a dual XP, they rock of course with crafty on that
>machine.
>
>See postings of several of them here at CCC. Just look around. I remember one
>from a few weeks ago even which i noticed.
>
>note that a dual XP 2400 already when put to 2.1ghz (so a very little bit
>overclocked the DDR ram, not so much the processor) is getting
>2.2MLN nodes a second hands down.
>
>if you would modify your thing such that it has a 128KB hashtable in total where
>you write in last 2 ply or so including qsearch, and in the global big hashtable
>you write >= 2 ply depthleft, then this small hashtable would get into the L2
>cache of your processors.

==> note that this applies also to pawnhashtable. A 64KB table in total for it
will do. Calculating the others is cheaper than lookups for it in big main ram,
or even a 2 trep table is possible. At the supercomputer i cannot allocate a big
global pawn table either. Too expensive. I put it in a small table that fits in
L2 cache. Of course the difference with diep and crafty is that an evaluation in
diep takes hundreds thousands of clocks and in crafty < 2000 clocks. So the
latency of a RAM read you can't afford simply, yet you do try them everywhere.
Most rotated bitboard lookups which require 1 MB L2 cache in fact, are lucky
usually already in the L2 cache.

I am sure you never measure such stuff but need tips like these then 2 years
later it is done in crafty.

>This will help both K7 and P4 a lot in NPS for crafty and make it less of a RAM
>to cache eater, the K7 (perhaps P4 too i do not know, P3 doesn't have it though)
>can benefit a lot there because it can then use the alpha 21264 feature where it
>can read from the L2 cache of the other guy while reading. Majority will be
>reads of course. > 50% of the cases will be reads.
>
>Crafty is nowadays that fast that priority is avoiding the slow lookups to the
>hashtable. A hashtable that fits within L2 cache is just the way to go.
>
>Note that for the transpositiontable it is possible to do like the pro's do and
>put each read to it at the start of a cache line. This is possible to do in C,
>no need for assembly.
>
>Of course you would never invent that yourself but it will make crafty less of a
>RAM speed eater and deliver more IPC. In fact so much IPC you will get then that
>the future is bright and clear for crafty then with regards to NPS.
>
>Getting from the current 2.1MLN a second to 3MLN a second is no problem. In fact
>with a very tiny last ply hashtable it is possible to even get more efficient as
>you also hash qsearch.
>
>A lookup to the L2 cache is very cheap compared to an evaluation and the
>majority of transpositions is always within a couple of hundreds of thousands of
>nodes.
>
>
>
>
>>On March 05, 2003 at 15:25:20, Vincent Diepeveen wrote:
>>
>>>On March 05, 2003 at 11:34:39, Robert Hyatt wrote:
>>>
>>>see the posted speeds of crafty at the very cheap dual K7 XPs with faster FSB
>>>and then compare that with your own speeds. Crafty is a lot faster on those
>>>machines than yours. That despite crafty is a cache eater (among the
>>>chessprograms, not among specint).
>>
>>I will ask again, for a number > 2.16M nodes per second with Crafty.  I have no
>>doubt some box can produce that number, including the 3.06ghz xeons.  But to
>>date
>>I have not seen one.
>>
>>I do mean a real number, not a "if I pushed the clock to 2.3ghz it would do
>>this..."
>>type number...
>>
>>We just received our dual 3.06 boxes today.  Unfortunately they came with an
>>onboard
>>SCSI controller that supports hardware raid and we had to zap everything to
>>"unraid" the
>>disks.  Hopefully I can post some 3.06SMT results tomorrow.  These are 3.06
>>xeons (dual)
>>with 533mhz FSB so it will be interesting to see how they compare to my 2.8ghz
>>with
>>400mhz FSB.
>>
>>>
>>>>On March 05, 2003 at 10:21:32, Vincent Diepeveen wrote:
>>>>
>>>>>On March 04, 2003 at 22:47:04, Robert Hyatt wrote:
>>>>>
>>>>>>On March 04, 2003 at 17:39:42, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On March 04, 2003 at 16:32:33, Jay-R Delacruz wrote:
>>>>>>>
>>>>>>>>Do the deep versions of Fritz, Junior and Shredder support hyper-thread? Can
>>>>>>>>someone please tell me before upgrading my PC to try the deep versions?
>>>>>>>
>>>>>>>I just read email from Frans Morsch. DeepFritz7 gets 5-10% speedup by
>>>>>>>hyperthreading.
>>>>>>>
>>>>>>>Shredder gets more speedup in nodes a second than that, but it gets no speedup
>>>>>>>from it as it gets SMP already a far smaller speedup (1.5 or so), so it is
>>>>>>>smarter to turn SMT/HT off for it. perhaps shredder8 will fix this.
>>>>>>>
>>>>>>>For diep it speeds me up about 11% in NPS but i cannot garantuee that at a 4
>>>>>>>processor it will give a positive speedup.
>>>>>>>
>>>>>>>When running 2 processes at a P4 at 3.06ghz it will give for sure some speedup
>>>>>>>because it goes from 100k nps to 120k nps. Nearly 20% speedup it gets with it
>>>>>>>(18.6 or something) which gives a positive speedup also in depth.
>>>>>>>
>>>>>>>For deepjunior we know that it already works bad at 8 processor Xeon 1.6Ghz
>>>>>>>versus 4 processor Xeon 1.9Ghz, so i *assume* for now that SMT/HT will not give
>>>>>>>it much benefit for it at all, but perhaps Amir or Shay wants to give a
>>>>>>>statement regarding this themselves.
>>>>>>>
>>>>>>>We talk of course about the SMT/HT from Xeon processors up to 2.8Ghz now for
>>>>>>>those which have it enabled. For the P4 3.06Ghz and also Xeons of that and above
>>>>>>>things are a different matter.
>>>>>>
>>>>>>You keep saying that.  It continues to be _wrong_.  The 2.8 xeon has the
>>>>>>_exact_ same cpu core (and SMT) that the 3.06 xeon and PIV has.  And when I
>>>>>>say _exactly_ I mean _exactly_.  This is _directly_ from Intel... for the
>>>>>>record.
>>>>>
>>>>>try some better source instead of the marketing department try some hardware
>>>>>experts. for example at: http://www.realworldtech.com/index.cfm
>>>>
>>>>
>>>>I don't use "marketing types".  And I can send you some dual 3.06 xeon test
>>>>results that
>>>>mirror my dual 2.8's _exactly_ in terms of the 20% to 30% raw NPS figures.  I
>>>>sent my
>>>>"worst case positions" to someone with one of these machines and he got the
>>>>_same_
>>>>20% improvement at 3.06 that I got with my 2.8's.
>>>>
>>>>Your data is simply wrong.  The xeon core has _not_ changed from 2.8ghz to 3.06
>>>>ghz,
>>>>and I have no idea why you want to supply your "disinformation" that it has.
>>>>
>>>>We have four of these on the way (dual 3.06 dell 650s) for faculty.  They have
>>>>shipped
>>>>(2/28) so they should be here any time.  I'll run the tests and post the results
>>>>to further
>>>>debunk this "myth" that 3.06's are different...
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>Best regards,
>>>>>>>Vincent



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.