Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistics errors in opening books created with the Shredder8/Fritz-GUI

Author: Stephen A. Boak

Date: 18:25:59 12/28/04

Go up one level in this thread


On December 28, 2004 at 20:54:45, William Penn wrote:

>Statistics errors in opening books created with Shredder8/Fritz-GUI
>-------------------------------------------------------------------
>
>I have been creating opening books for quite awhile with the Shredder 8
>CB/Fritz-GUI. [I'm talking about the same GUI that comes with Fritz 8, Junior 8,
>etc. - not the "Classic" Shredder GUI. Too bad such ambiguities exist, but
>hopefully that is clear enough.]
>
>The procedure (from the chessboard window in the GUI) is:
>1) File menu, New, Openings book... (creates a new opening book)
>2) Edit menu, Openings Book, Import Games... (imports games from any desired
>database)
>
>The resulting opening book can be displayed in the Book Pane. That is available
>by default as one of the tabs in the Notation pane, along with the Notation and
>Scoresheet tabs. Or it can be displayed separately by selecting "Extra Book
>Pane" from the Window menu, Panes...
>
>For any given position on the chessboard, the opening book shows the number of
>games, percentage scored, average rating, performance, and probabilities -
>overall, and for each of the possible moves in the position from the database
>involved. A histogram (bar graph) can also be displayed at the bottom of the
>Book Pane with the number of white wins, draws, black wins, percentages, etc.
>
>Much to my dismay, I have just discovered that the data and statistics shown are
>inaccurate. There must be a bug somewhere.
>
>While the total number of games in a given position appears to be correct in all
>cases, the other statistics & data for the variations is incorrect. For example
>there may actually only be one (1) game in the database for a particular move
>from the position, but the Book Pane may say that there are 3 games, or 5 games,
>etc. with corresponding incorrect statistics. Search of the database involved
>for that position proves that the displayed statistics are incorrect, which is
>easy to do when only a few games are involved.
>
>This error always displays more games than actually exist in the database.
>Exactly where the "extra" games come from is a mystery, because they do not
>exist in the database. It could be some kind of tree-building error when the
>games are imported. Or it could be an error in how the data is displayed (more
>of a GUI-based error). I don't know which.
>
>I also don't know if the error relates to the Length designated when importing
>games. That dialog box which appears asks you to select an Absolute length or an
>ECO-relative length of 1-100. This determines how big the tree is. The larger
>the Length selected, then the bigger (more MB or GB) the resulting opening book.
>I usually choose a Length in the range of 15-35, depending on the size of the
>database.
>
>I also don't know if the error relates to the size of the database involved.
>Most of my databases are large, the largest over 3,000,000 games. Or perhaps
>there is some limit on the size of the resulting book files, but I don't know.
>
>At this point, I also don't know exactly how bad the error is. If it's only a
>handful of games, it wouldn't matter very much when the sample of games in a
>given position is large.
>
>The uncertainties involved makes the resulting statistics practically worthless,
>pending further research. Too bad, because it could be quite valuable if
>accurate based on the database involved.
>
>At this point, I'm not aware of any way to avoid this bug. Does anyone know how
>it can be avoided?
>
>My apology for saying "I don't know" so often, but alas there is no detailed
>documentation or instructions for these operations insofar as I'm aware. You
>just have to discover them on your own, then try to get lucky, I guess. Alas
>that is typical of Chessbase documentation. They assume you already know
>everything.
>WP

I have detected similar puzzles in the past.

Some thoughts (ideas only):

1. Transpositions may mean the position arrived at after making the displayed
tree move was reached from different prior positions.

Thus a search from the current position may only show one instance.

2. Some transpositions may be included only in the notes within a game, but not
with a separately saved database game entry.

3. Reversed color openings may be included in the statistics, but a search may
not find them, even if "equivalent".

I suppose a test, creating a small database and corresponding tree, *may* show
something, but then the problem may be hard to replicate.

Let us know if you learn anything more.

Regards,
--Steve






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.