Author: Vincent Diepeveen
Date: 14:04:00 12/22/99
Go up one level in this thread
On December 21, 1999 at 10:27:29, Robert Hyatt wrote: >On December 21, 1999 at 09:24:29, Vincent Diepeveen wrote: > >>On December 21, 1999 at 08:49:08, Albert Silver wrote: >> >>>On December 21, 1999 at 02:05:54, Greg Lindahl wrote: >>> >>>>On December 21, 1999 at 01:39:44, Dann Corbit wrote: >>>> >>>>>>You are assuming that it's needlessly inefficient? Why? >>>>> >>>>>Experience with those I have seen only. >>>> >>>>Ah. So you don't know if the program I am referring to is inefficient just >>>>because it spends 90% of its time in eval. So, nothing much learned, except that >>>>you know that sometimes that means something bad. >>>> >>>>You can probably assume that anyone thinking about sending an eval to silicon >>>>would make sure it's reasonably optimized first. >> >>2 approaches: >> - running it in software before putting it to hardware >> - using hardware that's reprogrammable for testing >> >>>>-- g >>> >>>So what engine uses 90% of its time on the eval? I'm curious. >> >>Mine is. And that 90% is just the full eval which is getting used in 40% of >>the cases. In 60% of the cases i'm getting the eval out of hashtables. >> >> >> >>> Albert Silver > > >OK... we _MUST_ speak the same language. You are leading Greg way astray with >such comments. If you search for 10 minutes, do you spend 90% of the total cpu >time in your eval, or 40%? You said _both_ above. And both can't be right. > >It doesn't matter if slow eval takes 90% when you have to do it, but you only >have to do it 40% of the time. That is 36% of the total time in eval. Let's >get on the same page. Greg would be disappointed to expect your program to >run 10x faster with his hardware eval, when in reality it will only get 40%. > >This hardware will be used wherever the eval is done. So out of a 10 minute >search, how many minutes in the eval for a typical opening, middlegame, and >endgame position? For Crafty, it is 50%, 35% and 35%, roughly. That is >obtained by using the profiler in gcc, and running several opening, middlegame >and endgame positions. Then adding up the cpu time used for _all_ of the eval >modules starting at Evaluate() and adding in the functions that calls. > >What kind of numbers do you get? It doesn't sound like 90% to me based on what >you wrote above... If diep runs for 10 minutes then 90% of its time is spent to evalaution. What people mistake at is that if they spent 45% of the systemtime to their eval that my eval (my dick if you like) is just 2 times bigger. It's not 2 times bigger it's a lot bigger than that. that 90% are only full evals. From the 10% left roughly 5% goes to hashtables, 5% goes to sorting, search, move generation (move generation is 0.6%). I'm not real interested in how much it is in the endgame. It's simply a lot less. Why am i not interested? Simple. I search deep enough in the endgame anyway. It can't hurt to search deeper, but basically searching deeper in the endgame is not gonna bring much more, as diep already gets quite deep there. So if you present numbers to Greg, then use that 90% number. The % after that is not important. All the tactical and positional problems deriving from a small search depths are in that part of the game, as we all know too well. In the search in itself is not the obstacle, of course eval and EGTB usual are there. Note that times are measured for the single cpu version at a PIII. The parallel version is in the opening/middlegame another 3% slower JUST BECAUSE of locking and some datastructures i keep track of up to 5% in the endgame. At your Xeon i hardly could see the difference between the single cpu version and the parallel version. At the celeron difference was a bit bigger though. It is in opening about 5% difference. So if we assume Greg works with Xeon processor or Alpha 21264 processor the expectation is that there is hardly difference between the parallel version and the single cpu version. Would surprise me if the difference at the 21264 between the parallel version and the single version (both at 1 cpu) would be bigger than say 1%. The more processors get added though, the more problems i suppose with tree locking. My parallellism should run fine at n processors, but it's obviously written to get the best speedup at systems of 2 to 4 processors. I never tested at more processors than that. I don't even know the speedup of the current version at 4 processors. I only know that at blitz at 2 processors it gets 1.8 and that after 2 minutes it gets 2.0 on average. The whole calculation obviously is dependant upon what system is getting used. I'm doing in my search some pretty stupid things (from eval in hardware viewpoint) in order to prevent more evaluations. By getting rid of them and simply getting an evaluation from hardware i'm sure i can speed up my search easily a lot. However the mainline conclusion is very tough. I tend to agree with Bob that running on a cluster with or without FPGA is not very smart as it seems now. Doing evaluation in FPGA i'm still researching, but from what i've seen/heart so far it is all very tricky and hard to figure out. General purpose processors get better and better. Parallellism will play a big role in the future (though it's hard to see when for me). It's gonna get harder and harder to design a hardware processor that can compete with software. Note that machines used by the big factories to produce processors also get more expensive each year, surpassing the economic inflation by a large margin. So logically making hardware chessprocessors is gonna get harder each year when expressed in money. So i tend to agree with Bob, just improve the software, hardware is not an option right now! Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.