Author: Robert Hyatt
Date: 08:48:19 08/24/05
Go up one level in this thread
On August 24, 2005 at 09:24:30, Tord Romstad wrote: >On August 24, 2005 at 02:20:40, Joost Buijs wrote: > >>I used some positions from the "wm_test" and recorded how long it took to solve >>them. > >I don't know much about parallell search, but intuitively I would expect test >suites (especially tactical ones) to be a poor way to measure the speedup with >multiple CPUs. The search tree in a typical test position will often look very >different from a normal search tree, and it is possible that the parallell >search >efficiency in such positions is very different from the efficiency in average >positions reached during normal games. > >Tord I agree. There are many issues. Some positions, once you find the correct key move, produce nearly perfectly ordered trees. Parallel searches can usually eat those up with high efficiency. Other position produce horribly ordered trees, and depending on the parallel search, they can eat these up or fall flat and produce horrible performance. The ones in the middle are harder, as move ordering is never perfect, and that can have a really bad effect on parallel search. That's the reason that for the older DTS paper, I chose to use positions from a real game, that didn't all have a "tactical solution" to search for. Because I kept getting asked "how does your parallel search perform in a game, not just on a set of random test positions?" That's not an easy question to answer. And if you do, someone will always criticize the result and ask "OK, but how does it perform on a set of unrelated test positions like Nolot or whatever?" :) Also, as I have shown repeatedly over the past few years here, parallel performance has a _large_ standard deviation on many positions, which means that picking just one or two and running each one time with 1 cpu and one time with 8 cpus is a poor way to measure performance. The data I will provide before long takes the same Cray Blitz game positions from the DTS paper, and runs them each 8 times and averages the speedup. That means I get to run 1 run with one cpu, then 4 runs with 2 cpus, and 8 runs with 4 and 8 cpus, to try to have enough data so that the variance gets averaged out to something meaningful. That burns one heck of a lot of CPU time... I am also running these tests with slightly different internal tuning options as well, and if I just change two of the tuning parameters, and only use two possible settings for each, now I have 4X as many runs to make. That's why this quad opteron is glowing orange in AMD's lab right now. :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.