Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Crafty Static Evals 2 questions

Author: Robert Hyatt
Date: 10:22:29 02/26/04
On February 26, 2004 at 04:25:57, martin fierz wrote:

>On February 25, 2004 at 12:39:29, Robert Hyatt wrote:
>
>>On February 25, 2004 at 12:13:34, martin fierz wrote:
>>
>>>On February 25, 2004 at 11:43:07, Robert Hyatt wrote:
>>>
>>>>On February 25, 2004 at 11:25:50, martin fierz wrote:
>>>>
>>>>>On February 25, 2004 at 10:58:36, Robert Hyatt wrote:
>>>>>
>>>>>>On February 25, 2004 at 09:13:43, martin fierz wrote:
>>>>>>
>>>>>>>On February 25, 2004 at 07:02:11, Dieter Buerssner wrote:
>>>>>>>
>>>>>>>>On February 25, 2004 at 05:56:16, martin fierz wrote:
>>>>>>>>
>>>>>>>>>it won't pop *my* eyes. i once reduced hash key sizes in my checkers program
>>>>>>>>>beyond all sensible settings, because there was a discussion here about whether
>>>>>>>>>you really need 64-bit keys. in my checkers program, i have 64 bit keys, but
>>>>>>>>>effectively it's only using about 52 bits. i have about a 20 bit part which is
>>>>>>>>>used for the hashindex with %, and of the remaining 44 bits i store only 32 as a
>>>>>>>>>check. i reduced those 32 down to about 8 (!!) bits and in 100 test positions
>>>>>>>>>only saw one different move played IIRC. ridiculous, i must have lots of
>>>>>>>>>collisions there. unfortunately, i didn't count the collision number, or write
>>>>>>>>>down the results - but i know what you're talking about!
>>>>>>>>
>>>>>>>>Almost the same experiment with my chess engine (inluding many details, like the
>>>>>>>>effective number of bits used, and going down to 8 bits only):
>>>>>>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=190318
>>>>>>>>
>>>>>>>>Regards,
>>>>>>>>Dieter
>>>>>>>
>>>>>>>hi dieter,
>>>>>>>
>>>>>>>i had forgotten about your post on this, but now i remember it. very similar to
>>>>>>>my observations, and if only we had written our observations up a bit more
>>>>>>>seriously we could have written the paper that bob is publishing now ;-)
>>>>>>>
>>>>>>>cheers
>>>>>>>  martin
>>>>>>
>>>>>>
>>>>>>Hey, I'm easy to get along with here.  :)
>>>>>>
>>>>>>I have already asked one other person to do some similar testing.  I'd be happy
>>>>>>to tell you both what I have done, and have you run similar tests, and join me
>>>>>>as authors on this paper.
>>>>>>
>>>>>>I am doing the test slightly different, as rather than a specific number of
>>>>>>signature bits, I am forcing a particular error rate (ie one error every N
>>>>>>nodes) with the idea being that I should be able to choose N in 1 error every N
>>>>>>nodes such that the score never changes, or the score changes or not the best
>>>>>>move, or the best move changes but it is not a bad change, or the best move
>>>>>>changes and it probably changes the game outcome.
>>>>>>
>>>>>>If either/both are interested, email me and I can send you a draft, which
>>>>>>explains how I am testing, and includes the test positions I am using.  I have
>>>>>>some endgame positions (ie like fine 70), some sharp tactical positions like the
>>>>>>Ba3 Botvinnik-Capablanca move, and some plain middlegame positions from games
>>>>>>Crafty played on ICC.
>>>>>>
>>>>>>Let me know if you are interested...
>>>>>
>>>>>hi bob,
>>>>>
>>>>>this wasn't intended as criticism :-)
>>>>>you are a computer scientist, and i am not; it is your job to write this sort of
>>>>>paper - mine would be to write papers about physics...
>>>>>anyway, i have nothing to contribute chess-wise: my program is 200-300 rating
>>>>>points weaker than crafty, and i don't believe in writing academic papers about
>>>>>"toy programs". as you recently pointed out to me, you do some things different
>>>>>now that you search deeper (no more pin detection, higher R for nullmove).
>>>>>for example, i could write a paper about how R=1 is better than R=2 for my toy
>>>>>program, which is completely irrelevant in general, because for good programs
>>>>>it's clear that R=2 is better than R=1. so the only thing i could contribute is
>>>>>a similar experiment for checkers, where my program is much more advanced than
>>>>>my chess program. but i doubt that that would really be interesting either :-)
>>>>>
>>>>>cheers
>>>>>  martin
>>>>
>>>>Here you are dead wrong.
>>>>The "quality" of the engine's play is not a factor,
>>>>what is interesting is how hash collisions affect _any_ program at all.  IE one
>>>>way to present this data is in a chart like this:
>>>>
>>>>        ---------one error every N probes---------
>>>>pos     10     100    1000   10000  100000 1000000
>>>> 1       -       -       -       -       -       -
>>>> 2       S       S       S       -       -       -
>>>> 3       S       -       -       -       -       -
>>>> 4       -       -       -       -       -       -
>>>> 5       S       S       S       -       -       -
>>>> 6       S       S       S       S       -       -
>>>> 7       -       -       -       -       -       -
>>>> 8       G       S       -       -       -       -
>>>> 9       S       -       -       -       -       -
>>>>10       S       S       -       -       -       -
>>>>11       S       -       -       -       -       -
>>>>12       -       -       S       -       -       -
>>>>13       S       -       -       -       -       -
>>>>14       S       S       -       -       -       -
>>>>15       S       S       -       -       -       -
>>>>16       S       S       -       -       -       -
>>>>17       M       -       -       -       -       -
>>>>18       S       S       -       -       -       -
>>>>
>>>>Let me explain the data.  18 positions.  First 6 are endgames, second 6 are
>>>>sharp tactical lines, last 6 are ordinary middlegame positions with no tactical
>>>>shots.
>>>>
>>>>The columns should be obvious, 1 hash error every N probes into the table.
>>>>
>>>>The letters signify
>>>>
>>>>"S" means score changed from the score produced by no collisions.
>>>>
>>>>"M" means the best move changed, but when I searched it with a no-collision
>>>>search it was not significantly different.
>>>>
>>>>"G" means that the best move changed, and it was way worse than the correct
>>>>no-collision move, so that it is likely that this error would cause a change in
>>>>the game outcome.  IE the Ba3 move by Botvinnik is a +3 move, but a couple of
>>>>errors caused Crafty to totally miss the move and choose something that was
>>>>about one piece worse, likely changing the game outcome.
>>>>
>>>>This is not complete, but it gives the idea.  We could take 2-3 programs and run
>>>>the same test, and make this a composite table.  IE if any program changes the
>>>>score or move, we indicate that.  Or if all programs change the score or move,
>>>>we indicate that...
>>>>
>>>>Bob
>>>
>>>sounds like a good idea. i'm not 100% sure whether the engine's strength has
>>>zero influence though. e.g. omid's verified null-move pruning paper has the flaw
>>>(IMO) that it uses a weak engine to check whether verified nullmove is better
>>>than non-verified. as you said, it may depend on the search depth you reach etc.
>>>whether R=1 / R=2 / R=3 is best or not. i would assume the same to be true for
>>>other kinds of experiments, like verified nullmove; or adding MPC to an existing
>>>program like buro did with crafty. perhaps it works for one program, perhaps
>>>not; and writing a paper based on one single program's performance may not be
>>>such a great idea...
>>>perhaps with this experiment it's the same for all engines - i don't know. i'd
>>>certainly send you my data if you send me the test set - then again, perhaps you
>>>should rather use some stronger engines to do this :-)
>>>
>>>cheers
>>>  martin
>>
>>
>>When you talk about programming algorithms, I don't disagree with you at all.
>>But when we talk about how hash collisions influence a program's performance, I
>>suspect that the results are going to be fairly close.  Granted that for a
>>material-only evaluation, hash errors might well produce fewer score changes,
>>only bacause the score is very "coarse" in granularity.  But once you get to
>>centipawn scores, I'd expect that programs would behave somewhat similarly even
>>if not identically.  And that would actually be an interesting aspect for
>>comparison, in fact...
>>
>>Of course, both of you are welcome to run the positions, the only requirement is
>>that you have to hack your hash probe code a small bit so that you can run an
>>experiment that is compatible with what I have done...
>>
>>If you (or Dieter) are interesting in running the test, let me know.  The
>>positions are not "secret" at all, I sort of did what Dieter did, except I
>>started out with the idea of 1/3 endgames, 1/3 sharp tactical lines, 1/3 normal
>>positions, just to get a feel for what gets changed the most...
>>
>>I am using 18 positions so I can run the test many times with lots of different
>>error rates, hash table sizes, etc...
>
>hi bob,
>
>yes, i'd be interested. i guess i wouldn't have to change too much in my hash
>code; you're basically introducing errors at a specific rate, e.g. with a random
>function call which has a fixed probability of giving the error or with a
>counter which makes the error appear every N moves. i have 3 hash tables in my
>program, for pawns, for qsearch and for normal search. i take it it would only
>be the main hash table that you fudge with?
>
>cheers
>  martin


Yes.  I force an error every 10^N hash probes, and I am only doing this for the
transposition/refutation table, not pawn hashes and so forth.  I am varying N
from 1 to 7 and running the test set with hash=384M and 12M to see what the
difference is when the hash table is small vs large.

If you want to run the test, send me an email, I will send you the 18 positions
I am using along with the specific details of how I am "breaking" the hash code
so that we can all do it in the same way to get comparable results.  I have two
other volunteers already, a third would be nice.  IE it would be pretty
convincing if we all get the same sorts of numbers (within reason) using four
different chess engines.  I guess the only issue here is that by not including a
"certain" program, our results will all be invalid, but I'm not going _that_ far
as I have enough grey hair already. :)
Re: Crafty Static Evals 2 questions martin fierz 04:11:21 02/27/04
- Re: Crafty Static Evals 2 questions Robert Hyatt 07:58:06 02/27/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.