Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: new computer chess effort

Author: Vincent Diepeveen
Date: 14:04:00 12/22/99
On December 21, 1999 at 10:27:29, Robert Hyatt wrote:

>On December 21, 1999 at 09:24:29, Vincent Diepeveen wrote:
>
>>On December 21, 1999 at 08:49:08, Albert Silver wrote:
>>
>>>On December 21, 1999 at 02:05:54, Greg Lindahl wrote:
>>>
>>>>On December 21, 1999 at 01:39:44, Dann Corbit wrote:
>>>>
>>>>>>You are assuming that it's needlessly inefficient? Why?
>>>>>
>>>>>Experience with those I have seen only.
>>>>
>>>>Ah. So you don't know if the program I am referring to is inefficient just
>>>>because it spends 90% of its time in eval. So, nothing much learned, except that
>>>>you know that sometimes that means something bad.
>>>>
>>>>You can probably assume that anyone thinking about sending an eval to silicon
>>>>would make sure it's reasonably optimized first.
>>
>>2 approaches:
>>  - running it in software before putting it to hardware
>>  - using hardware that's reprogrammable for testing
>>
>>>>-- g
>>>
>>>So what engine uses 90% of its time on the eval? I'm curious.
>>
>>Mine is. And that 90% is just the full eval which is getting used in 40% of
>>the cases. In 60% of the cases i'm getting the eval out of hashtables.
>>
>>
>>
>>>                                 Albert Silver
>
>
>OK... we _MUST_ speak the same language.  You are leading Greg way astray with
>such comments.  If you search for 10 minutes, do you spend 90% of the total cpu
>time in your eval, or 40%?  You said _both_ above.  And both can't be right.
>
>It doesn't matter if slow eval takes 90% when you have to do it, but you only
>have to do it 40% of the time.  That is 36% of the total time in eval.  Let's
>get on the same page.  Greg would be disappointed to expect your program to
>run 10x faster with his hardware eval, when in reality it will only get 40%.
>
>This hardware will be used wherever the eval is done.  So out of a 10 minute
>search, how many minutes in the eval for a typical opening, middlegame, and
>endgame position?  For Crafty, it is 50%, 35% and 35%, roughly.  That is
>obtained by using the profiler in gcc, and running several opening, middlegame
>and endgame positions.  Then adding up the cpu time used for _all_ of the eval
>modules starting at Evaluate() and adding in the functions that calls.
>
>What kind of numbers do you get?  It doesn't sound like 90% to me based on what
>you wrote above...

If diep runs for 10 minutes then 90% of its time is spent to evalaution.

What people mistake at is that if they spent 45% of the systemtime to
their eval that my eval (my dick if you like) is just 2 times bigger.
It's not 2 times bigger it's a lot bigger than that. that 90% are
only full evals. From the 10% left roughly 5% goes to hashtables, 5%
goes to sorting, search, move generation (move generation is 0.6%).

I'm not real interested in how much it is in the endgame. It's simply
a lot less. Why am i not interested? Simple. I search deep enough
in the endgame anyway. It can't hurt to search deeper, but basically
searching deeper in the endgame is not gonna bring much more,
as diep already gets quite deep there.

So if you present numbers to Greg, then use that 90% number. The %
after that is not important. All the tactical and positional problems
deriving from a small search depths are in that part of the game,
as we all know too well. In the search in itself is not the obstacle,
of course eval and EGTB usual are there.

Note that times are measured for the single cpu version at a PIII.
The parallel version is in the opening/middlegame another 3% slower JUST BECAUSE
of locking and some datastructures i keep track of up to 5% in the endgame.

At your Xeon i hardly could see the difference between the single cpu version
and the parallel version. At the celeron difference was a bit bigger though.
It is in opening about 5% difference.

So if we assume Greg works with Xeon processor or Alpha 21264 processor
the expectation is that there is hardly difference between the parallel
version and the single cpu version. Would surprise me if the difference
at the 21264 between the parallel version and the single version (both
at 1 cpu) would be bigger than say 1%.

The more processors get added though,
the more problems i suppose with tree locking. My parallellism should
run fine at n processors, but it's obviously written
to get the best speedup at systems of 2 to 4 processors.

I never tested at more processors than that. I don't even know the
speedup of the current version at 4 processors. I only know that
at blitz at 2 processors it gets 1.8 and that after 2 minutes it
gets 2.0 on average.

The whole calculation obviously is dependant upon what system is
getting used. I'm doing in my search some pretty stupid things (from
eval in hardware viewpoint) in
order to prevent more evaluations. By getting rid of them and simply
getting an evaluation from hardware i'm sure i can speed up my search
easily a lot.

However the mainline conclusion is very tough. I tend to agree
with Bob that running on a cluster with or without FPGA is not very
smart as it seems now.

Doing evaluation in FPGA i'm still researching, but from
what i've seen/heart so far it is all very tricky and hard to figure out.

General purpose processors get better and better. Parallellism will play
a big role in the future (though it's hard to see when for me).

It's gonna get harder and harder to design a hardware processor that
can compete with software. Note that machines used by the big factories
to produce processors also get more expensive each year, surpassing
the economic inflation by a large margin.

So logically making hardware chessprocessors is gonna get harder each year
when expressed in money.

So i tend to agree with Bob, just improve the software, hardware is
not an option right now!

Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.