Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Crafty Static Evals 2 questions

Author: Robert Hyatt
Date: 09:39:29 02/25/04
On February 25, 2004 at 12:13:34, martin fierz wrote:

>On February 25, 2004 at 11:43:07, Robert Hyatt wrote:
>
>>On February 25, 2004 at 11:25:50, martin fierz wrote:
>>
>>>On February 25, 2004 at 10:58:36, Robert Hyatt wrote:
>>>
>>>>On February 25, 2004 at 09:13:43, martin fierz wrote:
>>>>
>>>>>On February 25, 2004 at 07:02:11, Dieter Buerssner wrote:
>>>>>
>>>>>>On February 25, 2004 at 05:56:16, martin fierz wrote:
>>>>>>
>>>>>>>it won't pop *my* eyes. i once reduced hash key sizes in my checkers program
>>>>>>>beyond all sensible settings, because there was a discussion here about whether
>>>>>>>you really need 64-bit keys. in my checkers program, i have 64 bit keys, but
>>>>>>>effectively it's only using about 52 bits. i have about a 20 bit part which is
>>>>>>>used for the hashindex with %, and of the remaining 44 bits i store only 32 as a
>>>>>>>check. i reduced those 32 down to about 8 (!!) bits and in 100 test positions
>>>>>>>only saw one different move played IIRC. ridiculous, i must have lots of
>>>>>>>collisions there. unfortunately, i didn't count the collision number, or write
>>>>>>>down the results - but i know what you're talking about!
>>>>>>
>>>>>>Almost the same experiment with my chess engine (inluding many details, like the
>>>>>>effective number of bits used, and going down to 8 bits only):
>>>>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=190318
>>>>>>
>>>>>>Regards,
>>>>>>Dieter
>>>>>
>>>>>hi dieter,
>>>>>
>>>>>i had forgotten about your post on this, but now i remember it. very similar to
>>>>>my observations, and if only we had written our observations up a bit more
>>>>>seriously we could have written the paper that bob is publishing now ;-)
>>>>>
>>>>>cheers
>>>>>  martin
>>>>
>>>>
>>>>Hey, I'm easy to get along with here.  :)
>>>>
>>>>I have already asked one other person to do some similar testing.  I'd be happy
>>>>to tell you both what I have done, and have you run similar tests, and join me
>>>>as authors on this paper.
>>>>
>>>>I am doing the test slightly different, as rather than a specific number of
>>>>signature bits, I am forcing a particular error rate (ie one error every N
>>>>nodes) with the idea being that I should be able to choose N in 1 error every N
>>>>nodes such that the score never changes, or the score changes or not the best
>>>>move, or the best move changes but it is not a bad change, or the best move
>>>>changes and it probably changes the game outcome.
>>>>
>>>>If either/both are interested, email me and I can send you a draft, which
>>>>explains how I am testing, and includes the test positions I am using.  I have
>>>>some endgame positions (ie like fine 70), some sharp tactical positions like the
>>>>Ba3 Botvinnik-Capablanca move, and some plain middlegame positions from games
>>>>Crafty played on ICC.
>>>>
>>>>Let me know if you are interested...
>>>
>>>hi bob,
>>>
>>>this wasn't intended as criticism :-)
>>>you are a computer scientist, and i am not; it is your job to write this sort of
>>>paper - mine would be to write papers about physics...
>>>anyway, i have nothing to contribute chess-wise: my program is 200-300 rating
>>>points weaker than crafty, and i don't believe in writing academic papers about
>>>"toy programs". as you recently pointed out to me, you do some things different
>>>now that you search deeper (no more pin detection, higher R for nullmove).
>>>for example, i could write a paper about how R=1 is better than R=2 for my toy
>>>program, which is completely irrelevant in general, because for good programs
>>>it's clear that R=2 is better than R=1. so the only thing i could contribute is
>>>a similar experiment for checkers, where my program is much more advanced than
>>>my chess program. but i doubt that that would really be interesting either :-)
>>>
>>>cheers
>>>  martin
>>
>>Here you are dead wrong.
>>The "quality" of the engine's play is not a factor,
>>what is interesting is how hash collisions affect _any_ program at all.  IE one
>>way to present this data is in a chart like this:
>>
>>        ---------one error every N probes---------
>>pos     10     100    1000   10000  100000 1000000
>> 1       -       -       -       -       -       -
>> 2       S       S       S       -       -       -
>> 3       S       -       -       -       -       -
>> 4       -       -       -       -       -       -
>> 5       S       S       S       -       -       -
>> 6       S       S       S       S       -       -
>> 7       -       -       -       -       -       -
>> 8       G       S       -       -       -       -
>> 9       S       -       -       -       -       -
>>10       S       S       -       -       -       -
>>11       S       -       -       -       -       -
>>12       -       -       S       -       -       -
>>13       S       -       -       -       -       -
>>14       S       S       -       -       -       -
>>15       S       S       -       -       -       -
>>16       S       S       -       -       -       -
>>17       M       -       -       -       -       -
>>18       S       S       -       -       -       -
>>
>>Let me explain the data.  18 positions.  First 6 are endgames, second 6 are
>>sharp tactical lines, last 6 are ordinary middlegame positions with no tactical
>>shots.
>>
>>The columns should be obvious, 1 hash error every N probes into the table.
>>
>>The letters signify
>>
>>"S" means score changed from the score produced by no collisions.
>>
>>"M" means the best move changed, but when I searched it with a no-collision
>>search it was not significantly different.
>>
>>"G" means that the best move changed, and it was way worse than the correct
>>no-collision move, so that it is likely that this error would cause a change in
>>the game outcome.  IE the Ba3 move by Botvinnik is a +3 move, but a couple of
>>errors caused Crafty to totally miss the move and choose something that was
>>about one piece worse, likely changing the game outcome.
>>
>>This is not complete, but it gives the idea.  We could take 2-3 programs and run
>>the same test, and make this a composite table.  IE if any program changes the
>>score or move, we indicate that.  Or if all programs change the score or move,
>>we indicate that...
>>
>>Bob
>
>sounds like a good idea. i'm not 100% sure whether the engine's strength has
>zero influence though. e.g. omid's verified null-move pruning paper has the flaw
>(IMO) that it uses a weak engine to check whether verified nullmove is better
>than non-verified. as you said, it may depend on the search depth you reach etc.
>whether R=1 / R=2 / R=3 is best or not. i would assume the same to be true for
>other kinds of experiments, like verified nullmove; or adding MPC to an existing
>program like buro did with crafty. perhaps it works for one program, perhaps
>not; and writing a paper based on one single program's performance may not be
>such a great idea...
>perhaps with this experiment it's the same for all engines - i don't know. i'd
>certainly send you my data if you send me the test set - then again, perhaps you
>should rather use some stronger engines to do this :-)
>
>cheers
>  martin


When you talk about programming algorithms, I don't disagree with you at all.
But when we talk about how hash collisions influence a program's performance, I
suspect that the results are going to be fairly close.  Granted that for a
material-only evaluation, hash errors might well produce fewer score changes,
only bacause the score is very "coarse" in granularity.  But once you get to
centipawn scores, I'd expect that programs would behave somewhat similarly even
if not identically.  And that would actually be an interesting aspect for
comparison, in fact...

Of course, both of you are welcome to run the positions, the only requirement is
that you have to hack your hash probe code a small bit so that you can run an
experiment that is compatible with what I have done...

If you (or Dieter) are interesting in running the test, let me know.  The
positions are not "secret" at all, I sort of did what Dieter did, except I
started out with the idea of 1/3 endgames, 1/3 sharp tactical lines, 1/3 normal
positions, just to get a feel for what gets changed the most...

I am using 18 positions so I can run the test many times with lots of different
error rates, hash table sizes, etc...
Re: Crafty Static Evals 2 questions martin fierz 01:25:57 02/26/04
- Re: Crafty Static Evals 2 questions Robert Hyatt 10:22:29 02/26/04
  - Re: Crafty Static Evals 2 questions martin fierz 04:11:21 02/27/04
    - Re: Crafty Static Evals 2 questions Robert Hyatt 07:58:06 02/27/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.