Author: Robert Hyatt
Date: 13:37:15 03/06/03
Go up one level in this thread
On March 06, 2003 at 11:45:19, Vincent Diepeveen wrote: >On March 06, 2003 at 11:38:26, Vincent Diepeveen wrote: > >>On March 05, 2003 at 18:08:45, Robert Hyatt wrote: >> >>you read postings of me seemingly, but i am only owning a dual 1.6ghz K7 with a >>slow 133Mhz bus, and because crafty is a cache eather a few guys which are at >>150Mhz or 166Mhz bus with a dual XP, they rock of course with crafty on that >>machine. >> >>See postings of several of them here at CCC. Just look around. I remember one >>from a few weeks ago even which i noticed. >> >>note that a dual XP 2400 already when put to 2.1ghz (so a very little bit >>overclocked the DDR ram, not so much the processor) is getting >>2.2MLN nodes a second hands down. >> >>if you would modify your thing such that it has a 128KB hashtable in total where >>you write in last 2 ply or so including qsearch, and in the global big hashtable >>you write >= 2 ply depthleft, then this small hashtable would get into the L2 >>cache of your processors. > >==> note that this applies also to pawnhashtable. A 64KB table in total for it >will do. Calculating the others is cheaper than lookups for it in big main ram, >or even a 2 trep table is possible. At the supercomputer i cannot allocate a big >global pawn table either. Too expensive. I put it in a small table that fits in >L2 cache. Of course the difference with diep and crafty is that an evaluation in >diep takes hundreds thousands of clocks and in crafty < 2000 clocks. So the >latency of a RAM read you can't afford simply, yet you do try them everywhere. >Most rotated bitboard lookups which require 1 MB L2 cache in fact, are lucky >usually already in the L2 cache. > >I am sure you never measure such stuff but need tips like these then 2 years >later it is done in crafty. Right. I never measure such stuff, but always seem to have profile data handy to show you that your "guesses" are nonsense. I'm just lucky I guess. > >>This will help both K7 and P4 a lot in NPS for crafty and make it less of a RAM >>to cache eater, the K7 (perhaps P4 too i do not know, P3 doesn't have it though) >>can benefit a lot there because it can then use the alpha 21264 feature where it >>can read from the L2 cache of the other guy while reading. Majority will be >>reads of course. > 50% of the cases will be reads. >> >>Crafty is nowadays that fast that priority is avoiding the slow lookups to the >>hashtable. A hashtable that fits within L2 cache is just the way to go. >> >>Note that for the transpositiontable it is possible to do like the pro's do and >>put each read to it at the start of a cache line. This is possible to do in C, >>no need for assembly. >> >>Of course you would never invent that yourself but it will make crafty less of a >>RAM speed eater and deliver more IPC. In fact so much IPC you will get then that >>the future is bright and clear for crafty then with regards to NPS. >> >>Getting from the current 2.1MLN a second to 3MLN a second is no problem. In fact >>with a very tiny last ply hashtable it is possible to even get more efficient as >>you also hash qsearch. >> >>A lookup to the L2 cache is very cheap compared to an evaluation and the >>majority of transpositions is always within a couple of hundreds of thousands of >>nodes. >> >> >> >> >>>On March 05, 2003 at 15:25:20, Vincent Diepeveen wrote: >>> >>>>On March 05, 2003 at 11:34:39, Robert Hyatt wrote: >>>> >>>>see the posted speeds of crafty at the very cheap dual K7 XPs with faster FSB >>>>and then compare that with your own speeds. Crafty is a lot faster on those >>>>machines than yours. That despite crafty is a cache eater (among the >>>>chessprograms, not among specint). >>> >>>I will ask again, for a number > 2.16M nodes per second with Crafty. I have no >>>doubt some box can produce that number, including the 3.06ghz xeons. But to >>>date >>>I have not seen one. >>> >>>I do mean a real number, not a "if I pushed the clock to 2.3ghz it would do >>>this..." >>>type number... >>> >>>We just received our dual 3.06 boxes today. Unfortunately they came with an >>>onboard >>>SCSI controller that supports hardware raid and we had to zap everything to >>>"unraid" the >>>disks. Hopefully I can post some 3.06SMT results tomorrow. These are 3.06 >>>xeons (dual) >>>with 533mhz FSB so it will be interesting to see how they compare to my 2.8ghz >>>with >>>400mhz FSB. >>> >>>> >>>>>On March 05, 2003 at 10:21:32, Vincent Diepeveen wrote: >>>>> >>>>>>On March 04, 2003 at 22:47:04, Robert Hyatt wrote: >>>>>> >>>>>>>On March 04, 2003 at 17:39:42, Vincent Diepeveen wrote: >>>>>>> >>>>>>>>On March 04, 2003 at 16:32:33, Jay-R Delacruz wrote: >>>>>>>> >>>>>>>>>Do the deep versions of Fritz, Junior and Shredder support hyper-thread? Can >>>>>>>>>someone please tell me before upgrading my PC to try the deep versions? >>>>>>>> >>>>>>>>I just read email from Frans Morsch. DeepFritz7 gets 5-10% speedup by >>>>>>>>hyperthreading. >>>>>>>> >>>>>>>>Shredder gets more speedup in nodes a second than that, but it gets no speedup >>>>>>>>from it as it gets SMP already a far smaller speedup (1.5 or so), so it is >>>>>>>>smarter to turn SMT/HT off for it. perhaps shredder8 will fix this. >>>>>>>> >>>>>>>>For diep it speeds me up about 11% in NPS but i cannot garantuee that at a 4 >>>>>>>>processor it will give a positive speedup. >>>>>>>> >>>>>>>>When running 2 processes at a P4 at 3.06ghz it will give for sure some speedup >>>>>>>>because it goes from 100k nps to 120k nps. Nearly 20% speedup it gets with it >>>>>>>>(18.6 or something) which gives a positive speedup also in depth. >>>>>>>> >>>>>>>>For deepjunior we know that it already works bad at 8 processor Xeon 1.6Ghz >>>>>>>>versus 4 processor Xeon 1.9Ghz, so i *assume* for now that SMT/HT will not give >>>>>>>>it much benefit for it at all, but perhaps Amir or Shay wants to give a >>>>>>>>statement regarding this themselves. >>>>>>>> >>>>>>>>We talk of course about the SMT/HT from Xeon processors up to 2.8Ghz now for >>>>>>>>those which have it enabled. For the P4 3.06Ghz and also Xeons of that and above >>>>>>>>things are a different matter. >>>>>>> >>>>>>>You keep saying that. It continues to be _wrong_. The 2.8 xeon has the >>>>>>>_exact_ same cpu core (and SMT) that the 3.06 xeon and PIV has. And when I >>>>>>>say _exactly_ I mean _exactly_. This is _directly_ from Intel... for the >>>>>>>record. >>>>>> >>>>>>try some better source instead of the marketing department try some hardware >>>>>>experts. for example at: http://www.realworldtech.com/index.cfm >>>>> >>>>> >>>>>I don't use "marketing types". And I can send you some dual 3.06 xeon test >>>>>results that >>>>>mirror my dual 2.8's _exactly_ in terms of the 20% to 30% raw NPS figures. I >>>>>sent my >>>>>"worst case positions" to someone with one of these machines and he got the >>>>>_same_ >>>>>20% improvement at 3.06 that I got with my 2.8's. >>>>> >>>>>Your data is simply wrong. The xeon core has _not_ changed from 2.8ghz to 3.06 >>>>>ghz, >>>>>and I have no idea why you want to supply your "disinformation" that it has. >>>>> >>>>>We have four of these on the way (dual 3.06 dell 650s) for faculty. They have >>>>>shipped >>>>>(2/28) so they should be here any time. I'll run the tests and post the results >>>>>to further >>>>>debunk this "myth" that 3.06's are different... >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>Best regards, >>>>>>>>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.