Author: Robert Hyatt
Date: 21:30:13 06/07/02
Go up one level in this thread
On June 06, 2002 at 20:54:41, Vincent Diepeveen wrote: >On June 05, 2002 at 22:01:30, Robert Hyatt wrote: > >PERHAPS IT IS TIME YOU PROFILE CRAFTY AGAIN. So the profile from _last week_ was too old for you? I haven't changed one line of code since that profile run was done... remember that I am doing profile-based optimizations so I _must_ profile every time I change a line to get the fastest crafty version. Care to re-think that statement now??? > >Previous run you did must have been 10 years ago or so >that you guess it is needing 50% system time for its >evaluation. Eh? Crafty hasn't been around 10 years. That profile was very recent. > >The position where evaluation takes most system time is >the opening of course. A long profile run from there >is most 'lucky' from crafty's viewpoint. NO... the most time happens when there are passed pawns. And king safety issues. Because those aren't hashed and are computed dynamically each time a position is evaluated. > >No way i can let it eat more than 42% of the system time >for *all* the evaluation functions together. > >Note that the pawn structure eats very little system time here. I assume you profile on the opening position? I do something better than that so that the feedback optimizations work for _all_ parts of the game. > >Perhaps taking many MBs for pawnhashtable size is a bit big? > >I took 96MB for hashtable and 12MB for pawntable or so. > >In short in the far endgame it'll be more like 20% of the system >time going to evaluation. So? the middlegame is where the "action" is... > >Note that this was a parallel compile, but not a parallel run. So >in reality some overhead is wasted to that too, but i see that as >a loss everyone loses anyway. There is no loss in crafty for a parallel compile with no parallel run. That costs two instructions per node, which is nothing that can be measured. As in < .1% > >>On June 05, 2002 at 13:32:52, Vincent Diepeveen wrote: >> >>>On June 05, 2002 at 04:17:12, Bas Hamstra wrote: >>> >>>you forget to mention evaluation. Seems you guys forget >>>that chess is about evaluation of a position. You need >>>so much system time for SEE, Makemove and unmaking >>>moves that it seems you simply have *no time* for evaluation! >>> >>>If the capturing routine/SEE in qsearch is eating all of your system >>>time, then my advice is to NOT use a qsearch, but to use an evaluation >>>that directly estimates what you could possibly lose. Just 1 piece >>>of course. Remove that from the score then. >>> >>>That's a few clocks more and your thing gets a few million nodes a second, >>>but for sure searches 3 ply deeper. >> >>My results here are already well known. Crafty spends nearly 50% of the total >>time in Evaluate() and its sub-functions. SEE, etc are all very small parts. >>I see Evaluate range from a low of 33% of total search time to a high of just >>over 55%. Note that in the profile code you have to look carefully to get >>all of the individual parts of Evaluate() and not just Evaluate() by itself. >> >> >> >>> >>>>On June 04, 2002 at 20:31:41, Robert Hyatt wrote: >>>> >>>>>On June 04, 2002 at 18:01:03, Gian-Carlo Pascutto wrote: >>>>> >>>>>>On June 04, 2002 at 17:52:47, Dann Corbit wrote: >>>>>> >>>>>>>On June 04, 2002 at 16:28:39, Gian-Carlo Pascutto wrote: >>>>>>> >>>>>>>>On June 04, 2002 at 16:18:55, Gian-Carlo Pascutto wrote: >>>>>>>> >>>>>>>>>Because you are using a processor that is clocked at twice the clock >>>>>>>>>frequency? Why compare a 1ghz processor to a (nearly) 2ghz processor >>>>>>>>>and conclude anything about efficiency there? Is there anything that >>>>>>>>>suggests that the alpha is simply more "efficient"? To justify that >>>>>>>>>clock frequency disparity? >>>>>>>>> >>>>>>>>>A machine twice as fast (clock freq) _should_ perform just as well as >>>>>>>>>a 64 bit machine at 1/2 the frequency... Less would suggest that the >>>>>>>>>32 bit machine simply sucks badly. >>>>>>>> >>>>>>>>I don't agree with the validity of a clock-for-clock comparison, >>>>>>>>but if you want to do it anyway, I'll again point to Vincent's >>>>>>>>numbers: >>>>>>>> >>>>>>>>At the same clockspeed, Crafty only gets 33% faster on the 64-bits >>>>>>>>machine. >>>>>>>> >>>>>>>>When you read this, keep in mind that most applications get _more_ >>>>>>>>than 33% faster on the 64-bits machine. >>>>>>> >>>>>>>All the new 64 bit chips in the discussion are pretty much beta stage right >>>>>>>now. >>>>>> >>>>>>Not true for the Alpha. >>>>> >>>>>Depends on the alpha being discussed. DEC had processors beyond the 21264 >>>>>running. Although the 21264 was pretty good. Dann was a bit off on the >>>>>performance as Tim Mann was running a 21264 at 600mhz and getting right at >>>>>1M nodes per second. Mckinley is getting 1.5M at 1000mhz, so the alpha might >>>>>have a bit of an advantage still. but it is pretty small... >>>>> >>>>>Mckinley is only available to a select few. 21264's are fairly common. >>>>>Anything beyond that is not readily available... >>>>> >>>>> >>>>>> >>>>>>>So, I think that architecturally, it makes good sense to design for a 64 bit >>>>>>>system right now. >>>>>> >>>>>>That makes sense, if the 64 bit design is actually faster than the corresponding >>>>>>32 bit design (even on 64 bit hardware if you wish). >>>>>> >>>>>>The case for bitboards is not clear on that matter. Certainly, if >>>>>>the speedup over nonbitboards is only 33% they will have a hard time >>>>>>convincingly beating alternative appraoches even on 64 bit hardware. >>>>>> >>>>>>-- >>>>>>GCP >>>>> >>>>>You are assuming that bitboards are _slower_ than non-bitboard programs on >>>>>32 bit machines. I haven't seen this demonstrated yet. We can always do some >>>>>sort of a test. IE since the most common move generator issue is "generate all >>>>>captures" we can try that with bitboard and non-bitboard approaches to see if >>>>>one is really much better than the other on 32 bit machines. I don't think so >>>>>myself. I think they are pretty equal due to the multiple pipe issue. >>>>> >>>>>But a test could be done to see, since this is the most common thing needed >>>>>in a chess engine. >>>> >>>>That's not a fair test, I think. IMO the most heavily used routines are: >>>> >>>>- See() >>>>- GenCaps() >>>>- SquareAttacked() >>>>- Make/Unmake() >>>> >>>>You just pick the one in which bitboards is good. In fact it is nearly >>>>impossible to figure out what's best overall by comparing only parts. What you >>>>could do though, is generate "profile data" about a search in average middlegame >>>>positions, and see how many times each of the above functions is being called. >>>>Then we could turn this into a sort of benchmark: >>>> >>>>10.000 * a() >>>>8.000 * b() >>>>3.000 * d() >>>>5000 * c() >>>> >>>>and compare times for bitboards and 0x88 to do this. This would at least tell us >>>>if bitboards is faster *for Crafty*. >>>> >>>> >>>>Best regards, >>>>Bas.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.