Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Benches

Author: Robert Hyatt

Date: 08:18:00 09/03/02

Go up one level in this thread


On September 03, 2002 at 11:01:51, Aaron Gordon wrote:

>On September 03, 2002 at 10:29:00, Robert Hyatt wrote:
>
>>I already do that.  I have about 30 positions that I use during the profile
>>stage of "make profile"...
>>
>>I'm not sure what the -openmp is supposed to do since I am not using any
>>of the openMP pre-processing directives for parallel programming...
>>
>>By the way, I also use -fno-alias since I don't write that kind of sloppy
>>code...
>>
>>I will try the -O3.  Last time I did it slowed down just a bit so I stuck
>>with -O2...
>
>Ah ok, just wondering. I only see:
>CFLAGS='$(CFLAGS) -D_REENTRANT -O2 \
>                        -prof_use -prof_dir ./profile -fno-alias -tpp6' \
>in the crafty makefile. I just assumed you used the same on your quad. -O3 may
>just depend on which version you're compiling. I had a similar experience.. I
>found -O3 was a hair faster so I just stuck with it. I haven't compared it to
>-O2 lately. If you find -O3 is slightly slower then I have a lot of recompiling
>to do. :)
>
>Also something else I was curious about.. perhaps you'd be able to shed some
>insight on. When I was doing the profiling for Slate's dual box I found the
>fastest way to make the binaries was by using those settings *BUT* run the SMP
>tests on my machine (single CPU). I just profiled with normal settings, exited.
>Then I profiled again with smpmt 2 and exited (making two dyn's). After
>recompiling it ended up being a good deal faster than when Slate profiled it on
>his box using his dual cpus.
>

First, you should not profile using mt=2.  For several reasons.

1.  It will on occasion corrupt the profile files.  Apparently the way they
do that is not completely thread-safe.

2.  It can produce misleading results.  I do _all_ profiling using one cpu
because parallel search is so non-deterministic, and different profile runs
on a parallel code will produce different results, and hence slightly different
optimizations.  I want repeatibility whenevery possible.

Second, you should profile a variety of positions.  I use fine 70 for simple
endgame example, the starting position to be sure that I profile the Evaluate
Development() code.  I have several opening, middlegame and endgame positions
just to be sure that all the significant stuff gets hit during the profile
run...



>I have a guess as to why it could do this but I do not know enough about it to
>explain it properly. I'm thinking maybe the way I did it provided a "synthetic"
>dual cpu box which operated properly throughout the entire test with two
>threads, providing better profiling information. Think that is possible? I think
>Slate mentioned something about SMP doing random things, cutting off stuff and
>whatnot. I'm not sure if it did that when I profiled but thats why I'm guessing
>at this. Otherwise I haven't a clue. :)

The SMP search is highly non-deterministic in its behavior.  The serial
search will repeat the same search every time, which is why I prefer to
profile that behavior...




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.