Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: APHID , advances in ICCA #8

Author: Vincent Diepeveen

Date: 11:45:45 09/18/01

Go up one level in this thread


On September 18, 2001 at 12:46:23, Robert Hyatt wrote:

>On September 18, 2001 at 11:54:45, Vincent Diepeveen wrote:
>
>>On September 18, 2001 at 11:07:27, Gian-Carlo Pascutto wrote:
>>
>>>On September 18, 2001 at 09:10:02, Vincent Diepeveen wrote:
>>>
>>>>Well if no one here manages to do that, who am i to say that the
>>>>remainder of this algorithm is worth trying?
>>>
>>>Perhaps just noone bothered?
>>>
>>>>Which means that APHID already says who has to search what before relevance
>>>>of parallel splitting has been indicated. Considering that nowadays we
>>>>use nullmove bigtime, this makes APHID completely outdated, because
>>>>it in short doesn't wait at all!
>>>
>>>They Crafty they used had nullmove R=2. Nowadays most people (I know
>>>you don't, but you aren't everybody even if you continously think so)
>>>use R=2/R=3. I'm pretty sure that isn't going to make the difference
>>>between usefull and useless.
>>
>>All testresults they published are based upon 5 to 8 ply searches
>>and based upon a program called TheTurk. No mention of crafty speedups,
>>and if they are then it's still 5 to 8 ply.
>>
>>Who cares for 5 to 8 ply speedups in the year 2001?
>>
>>My speedup is horrible at 7 ply, no kidding, but look how my speedup is
>>at bigger depths!
>
>My speedup is no different at depth=7 than it is at depth=10.  It is a bit
>more efficient at deeper depths, but not 2x.  Or 1.5x even.



>Here's some data:

>searched to depth=9.  I can't really time depth=7
>searches as they take a few hundredths of a second and that
>is not very accurate.

>1cpu:  2.80
>4cpu:  1.02

So that's for Bxe4 trick about 2.75

>1cpu:  3.92
>4cpu:  1.39

So for f6 it is 2.8

>Now for 12 plies:
>
>1cpu:  87.00 secs
>4cpu:  21.08 seconds

So now it is for Bxe4 trick about 4.0 speedup

This is a huge difference!

>1cpu: 100.00 secs
>4cpu:  40.17 secs

Now it is 2.5 but we do not know in either of the both cases your
search output, so this is not very meaningful data. if your root move
flipped a time more here or if your score changed this can have
devastating influence at speedup!

I did the same test for DIEP at Bxe4 position, note that i only have
2 processors at most at home so i can't try 4 regrettably but here the
output first for single cpu:

00:00 0 0 13 (0) 1 -1.661 Bb7xe4 Bd3xe4 Nf6xe4 Qe2xe4 Qc7xc4 Qe4xc4 Rc8xc4
00:00 0 0 26 (0) 1 0.269 Rf8-e8
00:00 0 0 31 (0) 1 0.334 Nd7-c5 Na4xc5 Qc7xc5
00:00 0 0 114 (0) 2 0.334 Nd7-c5 Na4xc5 Qc7xc5
00:00 0 0 411 (0) 3 0.334 Nd7-c5 Na4xc5 Qc7xc5
00:00 0 0 1187 (0) 4 0.013 Nd7-c5 Na4xc5 Qc7xc5 Bb2-d4
++ f8-e8
00:00 0 0 3227 (0) 4 0.167 Rf8-e8 Bb2-d4 Re8-d8 Ra1-c1
++ f8-d8
00:00 0 0 4483 (0) 4 0.286 Rf8-d8 Bb2-d4 Nd7-c5 Na4xc5 d6xc5
00:00 0 0 7216 (0) 5 0.286 Rf8-d8 Bb2-d4 Nd7-c5 Na4xc5 d6xc5
00:00 0 0 18186 (0) 6 0.216 Rf8-d8 Bb2-d4 Kg8-f8 g2-g3 Nd7-c5 Na4xc5 d6xc5
00:02 0 0 95675 (0) 7 0.221 Rf8-d8 Bb2-d4 Kg8-f8 g2-g3 Kf8-g8 Ra1-b1 Nd7-c5 Na4x
c5 d6xc5
++ b7-e4
00:02 0 0 132048 (0) 7 0.765 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Bb2xf6 Be7xf6 Be
4-c6 Rc4xa4 Bc6xd7 Bf6xa1 Rd1xa1
00:03 0 0 181641 (0) 8 1.006 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4 Bb
2-d4
00:05 0 0 334446 (0) 9 1.055 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Rd1-d4 Rc4xd4 Bb
2xd4 Nf6xe4 Na4xb6
00:09 0 0 595985 (0) 10 0.661 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4 N
b6xd7 Nf6xd7 Bb2-c3
00:21 0 0 1479280 (0) 11 1.181 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4
Nb6xd7 Nf6xd7 a3-a4 Rf8-c8
00:36 0 0 2616558 (0) 12 0.886 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4
Nb6xd7 Nf6xd7 Nf3-d2 Re4-a4 g2-g3
01:32 0 0 6542438 (0) 13 1.072 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4
Nb6xd7 Nf6xd7 Nf3-d4 Be7-f6 a3-a4 d6-d5
02:54 0 0 12694811 (0) 14 0.724 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4
 Nb6xd7 Nf6xd7 Nf3-d4 Be7-f6 f2-f3 Re4-e3 a3-a4
16:25 0 0 57592380 (0) 15 0.949 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe4
 Nb6xd7 Nf6xd7 Nf3-d4 Be7-f6 f2-f3 Re4-e3 Kg1-f2 Bf6-g5

Now the parallel version:
00:00 0 0 13 (0) 1 -1.661 Bb7xe4 Bd3xe4 Nf6xe4 Qe2xe4 Qc7xc4 Qe4xc4 Rc8xc4
00:00 0 0 26 (0) 1 0.269 Rf8-e8
00:00 0 0 31 (0) 1 0.334 Nd7-c5 Na4xc5 Qc7xc5
00:00 0 0 109 (0) 2 0.334 Nd7-c5 Na4xc5 Qc7xc5
00:00 0 0 406 (0) 3 0.334 Nd7-c5 Na4xc5 Qc7xc5
00:00 0 0 1182 (0) 4 0.013 Nd7-c5 Na4xc5 Qc7xc5 Bb2-d4
++ f8-e8
00:00 0 0 3122 (717) 4 0.167 Rf8-e8 Bb2-d4 Re8-d8 Ra1-c1
++ f8-d8
00:00 0 0 4412 (856) 4 0.286 Rf8-d8 Bb2-d4 Nd7-c5 Na4xc5 d6xc5
00:00 0 0 7150 (856) 5 0.286 Rf8-d8 Bb2-d4 Nd7-c5 Na4xc5 d6xc5
00:00 0 0 17603 (2303) 6 0.196 Rf8-d8 Bb2-d4 Bb7-a8 Ra1-b1 Nd7-c5 Na4xc5 d6xc5
00:01 0 0 92406 (9177) 7 0.214 Rf8-d8 Bb2-d4 Bb7-a8 Qe2-e3 Nd7-c5 Na4xc5 d6xc5
++ b7-e4
00:02 0 0 155152 (14850) 7 0.765 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Bb2xf6 Be7xf
6 Be4-c6 Rc4xa4 Bc6xd7 Bf6xa1 Rd1xa1
00:03 0 0 201232 (15711) 8 1.006 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4xe
4 Bb2-d4
00:05 0 0 345901 (17111) 9 1.055 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Rd1-d4 Rc4xd
4 Bb2xd4 Nf6xe4 Na4xb6
00:08 0 0 601497 (20381) 10 0.661 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4x
e4 Nb6xd7 Nf6xd7 Bb2-c3
00:14 0 0 1407418 (20427) 11 1.181 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4
xe4 Nb6xd7 Nf6xd7 a3-a4 Rf8-c8
00:21 0 0 2484003 (20675) 12 0.886 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc4
xe4 Nb6xd7 Nf6xd7 Nf3-d2 Re4-a4 g2-g3
00:46 0 0 5861681 (163344) 13 1.072 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 Rc
4xe4 Nb6xd7 Nf6xd7 Nf3-d4 Be7-f6 a3-a4 d6-d5
01:27 0 0 11748805 (387481) 14 0.724 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6 R
c4xe4 Nb6xd7 Nf6xd7 Nf3-d4 Be7-f6 f2-f3 Re4-e3 a3-a4
04:52 0 0 36402494 (1490825) 15 0.949 Bb7xe4 Bd3xe4 Qc7xc4 Qe2xc4 Rc8xc4 Na4xb6
Rc4xe4 Nb6xd7 Nf6xd7 Nf3-d4 Be7-f6 f2-f3 Re4-e3 Kg1-f2 Bf6-g5

At 8 ply i do not notice any speedup.

But the speedup clearly slowly goes up when depth goes up.

So the speedup goes slowly to 2.0 to > 2.0 at depth==15 and no modifications
in PV and no changes in root score.

Speedup at 15 ply is actually 16:52 / 4:52 = 1012 / 292 = 3.47 speedup.

Now this is an easy position to get a good speedup of course, so don't think
that my *average* speedup is 3.47.

Note that the above diep version i used is a version *without*
dangerous extensions.

>In each of the two positions tested above, I ran the thing with one
>processor and then with 4 processors.  Everything else
>remained exactly the same.  The first position is kopec 22,
>the second is kopec23.  Chosen at random for no good reason.

Easy move ordering in the first position and score is not
slowly going down so i'm not so sure whether
it's smart to use this own, but here the results from DIEP speak
for itself. Also at one of the positions you tried you get a speedup increase.

Now i'll be the last to say that the worst case speedup from crafty at 4
processors is very good. But definitely there is evidence for you to
keep testing also but not with APHID :)

>I can do perfectly fine with very short searches.  Or very long searches.

I find 2.75 at short search versus 4.0 at longer search very clearly
not what you conclude.

>But remember that you and I are not splitting the tree in the same way,
>because I can split at the root when it is reasonable to do so.  that
>helps the speedup quite a bit.

It doesn't work for me and i find it very dangerous. Here in this position
the ply after you have found Bxe4 this is not a big problem of course.

>
>>
>>I don't want to crack down results from the APHID team back in 1996,
>>but we must clearly realize that what worked back then is NOT going to
>>work in 2001 anymore!
>
>Again, would you care to make a small wager?  They got about 33% efficiency
>on their hardware.  I think that can be improved on with some more work.  They
>only wanted to get it to work, not work very efficiently.  And I think that
>50% efficiency is doable.  And on my giganet hardware I think that well beyond
>50% is very doable.

33% with 8 ply searches in 1996 with a program from 2500 nodes a second at
16 expensive workstations, i find that very poor!

Even if you throw over the ball by hand you can get a better speedup at
8 ply with that a slow speed!

I got single cpu end 1996 at a pentiumpro200 already 5000-10000 nodes
a second, and i'm pretty sure that DIEP back in 1996 also was not a very
fast program, they needed 16 very expensive
workstations to get 2500 nodes a second in total!

If i would slow down diep factor 100 or something, then i'm sure that i
can even get a speedup over the internet with it at 16 nodes better than
what these guys showed!

>
>>
>>>>This refers to the fact that YBW for each node needs a result for the
>>>>first move first before splitting other moves. Now this is of course very
>>>>true, but this directly also shows that APHIDs branching factor is by
>>>>definition bigger than algorithms which use YBW.
>>>
>>>I don't think they make any claims that their branching factor is better
>>>than YBW. Would seem pretty silly to me. But then again, YBW probably
>>>gets killed in different ways when having to deal with slow interprocess
>>>comms on a cluster.
>
>YBW doesn't exclude clusters.  In a tree search, the number of "split"
>operations has to be constrained so that the overhead of splitting doesn't
>dominate the work done in actually searching the tree.  I control this in
>Crafty by limiting how near the tips of the branches I can split the tree.

Yes that's true, but you might underestimate something very important
here. You use nullmove bigtime in crafty. That means that missing
transpositiontable lookups from other processes last few plies means that
you definitely have a problem.

Also another small thing is that each processor has a local hashtable;
Rainer Feldmann said: "let's see all the RAM of each node as one big
distributed transpositiontable".

Now that means that his hashtable was n times bigger in his approach
than might be the case in your current plans!

>For a distributed approach, I intend on a second bound, which
>limits distributed
>splits even more severely, because the overhead is going to be higher in net
>communication/latency.

In that case you cannot distribute hashtable even. Also the branching factor
is going to be huge and the time needed to split a subtree is going to be
seconds.

To avoid loads of idle time this would mean that a search is going to need
tens of minutes otherwise the relation between idling and searching is
going to be bigtime in the advantage of the idling.

>I think the difference between myself and Vincent is that I am in the business
>of trying to find out _how_ to accomplish something, no matter how difficult it
>might seem at first glance.  I don't assume everything is impossible if I
>haven't done it myself.  I assume everything is possible, until I prove that it
>can't be done by trying every possible alternative I can think of.  IE I would
>rather drive a nail using a hammer.  But I have done it using pliers, a wrench,
>a rock, a screwdriver handle, and even a C-clamp.

Well i'm from europe and i we're more careful here,
i already have done huge calculations and many network
experiments on whether it was possible
  a) to get diep working over the network (100mbits) with a very limited
     number of nodes (n=2)
  b) possibility to not have a completely shared hashtable
  c) distribution over the net

Of course c) i didn't even start experimenting with because the 100mbits
already proved not enough for nowadays search depths. Also the big bummers
for distributed DIEP was the big system time wasted to network communication.

On paper it is little, in reality this slows down a program X times.

If you don't slow down the problem is branching factor as i clearly
can show here. My speedup is 2.0 or more at bigger depths.

My speedup sucks at 5-8 ply. At 8 ply in fact the speedup here reads 0.00

Speedup at bigger depths 2.0 though.

>It just takes a bit more
>thought and work, but it isn't impossible.  Now nails are being driven by
>compressed air, EM impulse, blanks in a gun using the nail as the projectile,
>etc.  All probably thought "impossible" 40 years ago.
>
>
>
>>>
>>>--
>>>GCP



This page took 0.08 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.