Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Static Evals 2 questions

Author: martin fierz

Date: 09:13:34 02/25/04

Go up one level in this thread


On February 25, 2004 at 11:43:07, Robert Hyatt wrote:

>On February 25, 2004 at 11:25:50, martin fierz wrote:
>
>>On February 25, 2004 at 10:58:36, Robert Hyatt wrote:
>>
>>>On February 25, 2004 at 09:13:43, martin fierz wrote:
>>>
>>>>On February 25, 2004 at 07:02:11, Dieter Buerssner wrote:
>>>>
>>>>>On February 25, 2004 at 05:56:16, martin fierz wrote:
>>>>>
>>>>>>it won't pop *my* eyes. i once reduced hash key sizes in my checkers program
>>>>>>beyond all sensible settings, because there was a discussion here about whether
>>>>>>you really need 64-bit keys. in my checkers program, i have 64 bit keys, but
>>>>>>effectively it's only using about 52 bits. i have about a 20 bit part which is
>>>>>>used for the hashindex with %, and of the remaining 44 bits i store only 32 as a
>>>>>>check. i reduced those 32 down to about 8 (!!) bits and in 100 test positions
>>>>>>only saw one different move played IIRC. ridiculous, i must have lots of
>>>>>>collisions there. unfortunately, i didn't count the collision number, or write
>>>>>>down the results - but i know what you're talking about!
>>>>>
>>>>>Almost the same experiment with my chess engine (inluding many details, like the
>>>>>effective number of bits used, and going down to 8 bits only):
>>>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=190318
>>>>>
>>>>>Regards,
>>>>>Dieter
>>>>
>>>>hi dieter,
>>>>
>>>>i had forgotten about your post on this, but now i remember it. very similar to
>>>>my observations, and if only we had written our observations up a bit more
>>>>seriously we could have written the paper that bob is publishing now ;-)
>>>>
>>>>cheers
>>>>  martin
>>>
>>>
>>>Hey, I'm easy to get along with here.  :)
>>>
>>>I have already asked one other person to do some similar testing.  I'd be happy
>>>to tell you both what I have done, and have you run similar tests, and join me
>>>as authors on this paper.
>>>
>>>I am doing the test slightly different, as rather than a specific number of
>>>signature bits, I am forcing a particular error rate (ie one error every N
>>>nodes) with the idea being that I should be able to choose N in 1 error every N
>>>nodes such that the score never changes, or the score changes or not the best
>>>move, or the best move changes but it is not a bad change, or the best move
>>>changes and it probably changes the game outcome.
>>>
>>>If either/both are interested, email me and I can send you a draft, which
>>>explains how I am testing, and includes the test positions I am using.  I have
>>>some endgame positions (ie like fine 70), some sharp tactical positions like the
>>>Ba3 Botvinnik-Capablanca move, and some plain middlegame positions from games
>>>Crafty played on ICC.
>>>
>>>Let me know if you are interested...
>>
>>hi bob,
>>
>>this wasn't intended as criticism :-)
>>you are a computer scientist, and i am not; it is your job to write this sort of
>>paper - mine would be to write papers about physics...
>>anyway, i have nothing to contribute chess-wise: my program is 200-300 rating
>>points weaker than crafty, and i don't believe in writing academic papers about
>>"toy programs". as you recently pointed out to me, you do some things different
>>now that you search deeper (no more pin detection, higher R for nullmove).
>>for example, i could write a paper about how R=1 is better than R=2 for my toy
>>program, which is completely irrelevant in general, because for good programs
>>it's clear that R=2 is better than R=1. so the only thing i could contribute is
>>a similar experiment for checkers, where my program is much more advanced than
>>my chess program. but i doubt that that would really be interesting either :-)
>>
>>cheers
>>  martin
>
>Here you are dead wrong.
>The "quality" of the engine's play is not a factor,
>what is interesting is how hash collisions affect _any_ program at all.  IE one
>way to present this data is in a chart like this:
>
>        ---------one error every N probes---------
>pos     10     100    1000   10000  100000 1000000
> 1       -       -       -       -       -       -
> 2       S       S       S       -       -       -
> 3       S       -       -       -       -       -
> 4       -       -       -       -       -       -
> 5       S       S       S       -       -       -
> 6       S       S       S       S       -       -
> 7       -       -       -       -       -       -
> 8       G       S       -       -       -       -
> 9       S       -       -       -       -       -
>10       S       S       -       -       -       -
>11       S       -       -       -       -       -
>12       -       -       S       -       -       -
>13       S       -       -       -       -       -
>14       S       S       -       -       -       -
>15       S       S       -       -       -       -
>16       S       S       -       -       -       -
>17       M       -       -       -       -       -
>18       S       S       -       -       -       -
>
>Let me explain the data.  18 positions.  First 6 are endgames, second 6 are
>sharp tactical lines, last 6 are ordinary middlegame positions with no tactical
>shots.
>
>The columns should be obvious, 1 hash error every N probes into the table.
>
>The letters signify
>
>"S" means score changed from the score produced by no collisions.
>
>"M" means the best move changed, but when I searched it with a no-collision
>search it was not significantly different.
>
>"G" means that the best move changed, and it was way worse than the correct
>no-collision move, so that it is likely that this error would cause a change in
>the game outcome.  IE the Ba3 move by Botvinnik is a +3 move, but a couple of
>errors caused Crafty to totally miss the move and choose something that was
>about one piece worse, likely changing the game outcome.
>
>This is not complete, but it gives the idea.  We could take 2-3 programs and run
>the same test, and make this a composite table.  IE if any program changes the
>score or move, we indicate that.  Or if all programs change the score or move,
>we indicate that...
>
>Bob

sounds like a good idea. i'm not 100% sure whether the engine's strength has
zero influence though. e.g. omid's verified null-move pruning paper has the flaw
(IMO) that it uses a weak engine to check whether verified nullmove is better
than non-verified. as you said, it may depend on the search depth you reach etc.
whether R=1 / R=2 / R=3 is best or not. i would assume the same to be true for
other kinds of experiments, like verified nullmove; or adding MPC to an existing
program like buro did with crafty. perhaps it works for one program, perhaps
not; and writing a paper based on one single program's performance may not be
such a great idea...
perhaps with this experiment it's the same for all engines - i don't know. i'd
certainly send you my data if you send me the test set - then again, perhaps you
should rather use some stronger engines to do this :-)

cheers
  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.