Author: Vincent Diepeveen
Date: 14:48:01 04/18/01
Go up one level in this thread
On April 18, 2001 at 06:03:34, Uri Blass wrote: >On April 18, 2001 at 05:20:03, Vincent Diepeveen wrote: > >>On April 17, 2001 at 15:40:58, Uri Blass wrote: >> >>>Suppose program X is playing against kramnik. >>> >>>I think that it may be a good idea to generate a program that generates a tree >>>and sends all the positions in the tree to people who want to help the computer >>>in the match(I will call it the main program). >>> >>>The people give program X to analyze the position and return score and depth to >>>the main program. >>> >>>The main program is using all the information about scores and depthes to decide >>>about the best move. >> >>So a beginner rated 600 is very happy about a certain move, >>and as a real patzer he returns infinite score for his move. > >Sorry but a beginner is not going to give information. >Only program X is going to give depthes and score for positions and program X is >a constant program. > >If program X is a preprocessor it should be changed to get not only a position >to analyze but also to get the root position otherwise it is not going to >analyze position from the same point of view. I see no advantages in parallellizing a true preprocessor, but if you talk about wanting to parallellize a program, then it's obvious that you need more as 1 cpu to speed it up! >Maybe you misunderstood me because you thought that program X is not a constant >program because X means a variable. >Maybe I should use the letter C to be more clear. If you talk about the same program which is going to get used to parallellize a search tree in an SMP way, so not in an assymmetric way, then you can look in some issues of ICCAJ and see a lot of different ways to parallellize the searchtree! I don't need to mention that parallellizing over networks is harder to generate a speedup as communications are more expensive, as that's very obvious. The vaste majority of all computers at www.top500.org are going to be slower as a 16 processor alpha with shared memory for sure, despite that the first few machines each costed several billion US$ to build. The bandwidth on those machines is huge, so definitely there are going to be algorithms that speed you up a lot after very hard work. You can of course also run parallel over the internet, but sometimes it takes for me 10 minutes to just get connected to the internet. If my program would be part of a crucial search tree, then obviously sometimes you have 15 minutes delay for just a small part of the search tree if you would want to wait 15 minutes for answer. The more costly communication is the smaller the speedup in general, but in case of using internet connections the biggest problem isn't even communication SPEED. It is the uncertainty of the communication and the length of timeouts. How many plies must i let a program search? If i give each program 10 ply, what timeout do i give it to let it finish before giving someone else the same job? 1 minute? 2 minutes? Suppose this is a hard search tree and that no free node (remote machine in this case) is able to search the search tree in 2 minutes. Then my program might be STUCK. It is perhaps HUNG forever. As it will to all machines deliver this search tree with the demand that it must be finished within 2 minutes. On the other hand i can only let the other 500 nodes search further when i get back results from THIS node. As otherwise i would be searching with a branching factor of 40 instead of a branching factor of around 5 or so (which is already very realistic for a search which is carried out without shared hashtables). Still i didn't touch the other subject. If my search algorithm has a huge branching factor, then for small searches this means searching over the internet is fast getting to a certain depth. But suppose we get 10 ply anyway very quick. Then let's look to plies 11 to 20. Suppose i search with my normal program with a branching factor of 3. Which is a branching factor which is very realistically on a shared memory machine with nullmove. 3^10 = 59049 Now the internet search is going to need: 5^10 = 9765625 So the total number of Mhz on the internet needs to be a factor 9765625/59049 = 165 faster There is a big penalty for a parallel search which is not sharing hashtable. I think that this problem was very clearly mentionned already over 10 years ago by Rainer Feldmann. >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.