Author: Robert Hyatt
Date: 13:36:15 03/06/03
Go up one level in this thread
On March 06, 2003 at 11:38:26, Vincent Diepeveen wrote: >On March 05, 2003 at 18:08:45, Robert Hyatt wrote: > >you read postings of me seemingly, but i am only owning a dual 1.6ghz K7 with a >slow 133Mhz bus, and because crafty is a cache eather a few guys which are at >150Mhz or 166Mhz bus with a dual XP, they rock of course with crafty on that >machine. > >See postings of several of them here at CCC. Just look around. I remember one >from a few weeks ago even which i noticed. > >note that a dual XP 2400 already when put to 2.1ghz (so a very little bit >overclocked the DDR ram, not so much the processor) is getting >2.2MLN nodes a second hands down. I'm waiting for someone to post a real result here from a _stock_ CPU. I don't go for overclocking and I don't consider overclocked numbers. And this NPS is on the bench command, with no crafty.rc file (default hash and everything). > >if you would modify your thing such that it has a 128KB hashtable in total where >you write in last 2 ply or so including qsearch, and in the global big hashtable >you write >= 2 ply depthleft, then this small hashtable would get into the L2 >cache of your processors. What's the point? We all run the "bench" command the same way, using the default hash table size. > >This will help both K7 and P4 a lot in NPS for crafty and make it less of a RAM >to cache eater, the K7 (perhaps P4 too i do not know, P3 doesn't have it though) >can benefit a lot there because it can then use the alpha 21264 feature where it >can read from the L2 cache of the other guy while reading. Majority will be >reads of course. > 50% of the cases will be reads. > >Crafty is nowadays that fast that priority is avoiding the slow lookups to the >hashtable. A hashtable that fits within L2 cache is just the way to go. > >Note that for the transpositiontable it is possible to do like the pro's do and >put each read to it at the start of a cache line. This is possible to do in C, >no need for assembly. Only if you make the entry a power of two. Mine is close now in that a triplet is 48 consecutive bytes. But I don't consider hashing a big problem. I could turn it totally off to see what it does to NPS of course, but I don't really care. > >Of course you would never invent that yourself but it will make crafty less of a >RAM speed eater and deliver more IPC. In fact so much IPC you will get then that >the future is bright and clear for crafty then with regards to NPS. > >Getting from the current 2.1MLN a second to 3MLN a second is no problem. In fact >with a very tiny last ply hashtable it is possible to even get more efficient as >you also hash qsearch. That is total baloney. Since > 50% of my total time is spent in Evaluate() there is no way to get from 2.1 to 3M by fiddling with the hash probe stuff. Here is the most recent profile output: 2.37 34.50 0.96 808908 0.00 0.00 HashStore 2.03 35.44 0.94 944844 0.00 0.00 HashProbe That first column is % time. So my hashing code is 4.37% of the total search time, how are you going to reduce that and make the program go 50% faster to reach 3M? The obvious answer is "you aren't." And I don't see why you don't see the flaw in your logic. > >A lookup to the L2 cache is very cheap compared to an evaluation and the >majority of transpositions is always within a couple of hundreds of thousands of >nodes. No lookup at all would make me 4.7% faster, but make the size of the tree _way_ larger. This is the kind of illogical reasoning that drives people mad. > > > > >>On March 05, 2003 at 15:25:20, Vincent Diepeveen wrote: >> >>>On March 05, 2003 at 11:34:39, Robert Hyatt wrote: >>> >>>see the posted speeds of crafty at the very cheap dual K7 XPs with faster FSB >>>and then compare that with your own speeds. Crafty is a lot faster on those >>>machines than yours. That despite crafty is a cache eater (among the >>>chessprograms, not among specint). >> >>I will ask again, for a number > 2.16M nodes per second with Crafty. I have no >>doubt some box can produce that number, including the 3.06ghz xeons. But to >>date >>I have not seen one. >> >>I do mean a real number, not a "if I pushed the clock to 2.3ghz it would do >>this..." >>type number... >> >>We just received our dual 3.06 boxes today. Unfortunately they came with an >>onboard >>SCSI controller that supports hardware raid and we had to zap everything to >>"unraid" the >>disks. Hopefully I can post some 3.06SMT results tomorrow. These are 3.06 >>xeons (dual) >>with 533mhz FSB so it will be interesting to see how they compare to my 2.8ghz >>with >>400mhz FSB. >> >>> >>>>On March 05, 2003 at 10:21:32, Vincent Diepeveen wrote: >>>> >>>>>On March 04, 2003 at 22:47:04, Robert Hyatt wrote: >>>>> >>>>>>On March 04, 2003 at 17:39:42, Vincent Diepeveen wrote: >>>>>> >>>>>>>On March 04, 2003 at 16:32:33, Jay-R Delacruz wrote: >>>>>>> >>>>>>>>Do the deep versions of Fritz, Junior and Shredder support hyper-thread? Can >>>>>>>>someone please tell me before upgrading my PC to try the deep versions? >>>>>>> >>>>>>>I just read email from Frans Morsch. DeepFritz7 gets 5-10% speedup by >>>>>>>hyperthreading. >>>>>>> >>>>>>>Shredder gets more speedup in nodes a second than that, but it gets no speedup >>>>>>>from it as it gets SMP already a far smaller speedup (1.5 or so), so it is >>>>>>>smarter to turn SMT/HT off for it. perhaps shredder8 will fix this. >>>>>>> >>>>>>>For diep it speeds me up about 11% in NPS but i cannot garantuee that at a 4 >>>>>>>processor it will give a positive speedup. >>>>>>> >>>>>>>When running 2 processes at a P4 at 3.06ghz it will give for sure some speedup >>>>>>>because it goes from 100k nps to 120k nps. Nearly 20% speedup it gets with it >>>>>>>(18.6 or something) which gives a positive speedup also in depth. >>>>>>> >>>>>>>For deepjunior we know that it already works bad at 8 processor Xeon 1.6Ghz >>>>>>>versus 4 processor Xeon 1.9Ghz, so i *assume* for now that SMT/HT will not give >>>>>>>it much benefit for it at all, but perhaps Amir or Shay wants to give a >>>>>>>statement regarding this themselves. >>>>>>> >>>>>>>We talk of course about the SMT/HT from Xeon processors up to 2.8Ghz now for >>>>>>>those which have it enabled. For the P4 3.06Ghz and also Xeons of that and above >>>>>>>things are a different matter. >>>>>> >>>>>>You keep saying that. It continues to be _wrong_. The 2.8 xeon has the >>>>>>_exact_ same cpu core (and SMT) that the 3.06 xeon and PIV has. And when I >>>>>>say _exactly_ I mean _exactly_. This is _directly_ from Intel... for the >>>>>>record. >>>>> >>>>>try some better source instead of the marketing department try some hardware >>>>>experts. for example at: http://www.realworldtech.com/index.cfm >>>> >>>> >>>>I don't use "marketing types". And I can send you some dual 3.06 xeon test >>>>results that >>>>mirror my dual 2.8's _exactly_ in terms of the 20% to 30% raw NPS figures. I >>>>sent my >>>>"worst case positions" to someone with one of these machines and he got the >>>>_same_ >>>>20% improvement at 3.06 that I got with my 2.8's. >>>> >>>>Your data is simply wrong. The xeon core has _not_ changed from 2.8ghz to 3.06 >>>>ghz, >>>>and I have no idea why you want to supply your "disinformation" that it has. >>>> >>>>We have four of these on the way (dual 3.06 dell 650s) for faculty. They have >>>>shipped >>>>(2/28) so they should be here any time. I'll run the tests and post the results >>>>to further >>>>debunk this "myth" that 3.06's are different... >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>Best regards, >>>>>>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.