Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Static Evals 2 questions

Author: Robert Hyatt

Date: 07:58:06 02/27/04

Go up one level in this thread


On February 27, 2004 at 07:11:21, martin fierz wrote:

>On February 26, 2004 at 13:22:29, Robert Hyatt wrote:
>
>>On February 26, 2004 at 04:25:57, martin fierz wrote:
>>
>>>On February 25, 2004 at 12:39:29, Robert Hyatt wrote:
>>>
>>>>On February 25, 2004 at 12:13:34, martin fierz wrote:
>>>>
>>>>>On February 25, 2004 at 11:43:07, Robert Hyatt wrote:
>>>>>
>>>>>>On February 25, 2004 at 11:25:50, martin fierz wrote:
>>>>>>
>>>>>>>On February 25, 2004 at 10:58:36, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On February 25, 2004 at 09:13:43, martin fierz wrote:
>>>>>>>>
>>>>>>>>>On February 25, 2004 at 07:02:11, Dieter Buerssner wrote:
>>>>>>>>>
>>>>>>>>>>On February 25, 2004 at 05:56:16, martin fierz wrote:
>>>>>>>>>>
>>>>>>>>>>>it won't pop *my* eyes. i once reduced hash key sizes in my checkers program
>>>>>>>>>>>beyond all sensible settings, because there was a discussion here about whether
>>>>>>>>>>>you really need 64-bit keys. in my checkers program, i have 64 bit keys, but
>>>>>>>>>>>effectively it's only using about 52 bits. i have about a 20 bit part which is
>>>>>>>>>>>used for the hashindex with %, and of the remaining 44 bits i store only 32 as a
>>>>>>>>>>>check. i reduced those 32 down to about 8 (!!) bits and in 100 test positions
>>>>>>>>>>>only saw one different move played IIRC. ridiculous, i must have lots of
>>>>>>>>>>>collisions there. unfortunately, i didn't count the collision number, or write
>>>>>>>>>>>down the results - but i know what you're talking about!
>>>>>>>>>>
>>>>>>>>>>Almost the same experiment with my chess engine (inluding many details, like the
>>>>>>>>>>effective number of bits used, and going down to 8 bits only):
>>>>>>>>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=190318
>>>>>>>>>>
>>>>>>>>>>Regards,
>>>>>>>>>>Dieter
>>>>>>>>>
>>>>>>>>>hi dieter,
>>>>>>>>>
>>>>>>>>>i had forgotten about your post on this, but now i remember it. very similar to
>>>>>>>>>my observations, and if only we had written our observations up a bit more
>>>>>>>>>seriously we could have written the paper that bob is publishing now ;-)
>>>>>>>>>
>>>>>>>>>cheers
>>>>>>>>>  martin
>>>>>>>>
>>>>>>>>
>>>>>>>>Hey, I'm easy to get along with here.  :)
>>>>>>>>
>>>>>>>>I have already asked one other person to do some similar testing.  I'd be happy
>>>>>>>>to tell you both what I have done, and have you run similar tests, and join me
>>>>>>>>as authors on this paper.
>>>>>>>>
>>>>>>>>I am doing the test slightly different, as rather than a specific number of
>>>>>>>>signature bits, I am forcing a particular error rate (ie one error every N
>>>>>>>>nodes) with the idea being that I should be able to choose N in 1 error every N
>>>>>>>>nodes such that the score never changes, or the score changes or not the best
>>>>>>>>move, or the best move changes but it is not a bad change, or the best move
>>>>>>>>changes and it probably changes the game outcome.
>>>>>>>>
>>>>>>>>If either/both are interested, email me and I can send you a draft, which
>>>>>>>>explains how I am testing, and includes the test positions I am using.  I have
>>>>>>>>some endgame positions (ie like fine 70), some sharp tactical positions like the
>>>>>>>>Ba3 Botvinnik-Capablanca move, and some plain middlegame positions from games
>>>>>>>>Crafty played on ICC.
>>>>>>>>
>>>>>>>>Let me know if you are interested...
>>>>>>>
>>>>>>>hi bob,
>>>>>>>
>>>>>>>this wasn't intended as criticism :-)
>>>>>>>you are a computer scientist, and i am not; it is your job to write this sort of
>>>>>>>paper - mine would be to write papers about physics...
>>>>>>>anyway, i have nothing to contribute chess-wise: my program is 200-300 rating
>>>>>>>points weaker than crafty, and i don't believe in writing academic papers about
>>>>>>>"toy programs". as you recently pointed out to me, you do some things different
>>>>>>>now that you search deeper (no more pin detection, higher R for nullmove).
>>>>>>>for example, i could write a paper about how R=1 is better than R=2 for my toy
>>>>>>>program, which is completely irrelevant in general, because for good programs
>>>>>>>it's clear that R=2 is better than R=1. so the only thing i could contribute is
>>>>>>>a similar experiment for checkers, where my program is much more advanced than
>>>>>>>my chess program. but i doubt that that would really be interesting either :-)
>>>>>>>
>>>>>>>cheers
>>>>>>>  martin
>>>>>>
>>>>>>Here you are dead wrong.
>>>>>>The "quality" of the engine's play is not a factor,
>>>>>>what is interesting is how hash collisions affect _any_ program at all.  IE one
>>>>>>way to present this data is in a chart like this:
>>>>>>
>>>>>>        ---------one error every N probes---------
>>>>>>pos     10     100    1000   10000  100000 1000000
>>>>>> 1       -       -       -       -       -       -
>>>>>> 2       S       S       S       -       -       -
>>>>>> 3       S       -       -       -       -       -
>>>>>> 4       -       -       -       -       -       -
>>>>>> 5       S       S       S       -       -       -
>>>>>> 6       S       S       S       S       -       -
>>>>>> 7       -       -       -       -       -       -
>>>>>> 8       G       S       -       -       -       -
>>>>>> 9       S       -       -       -       -       -
>>>>>>10       S       S       -       -       -       -
>>>>>>11       S       -       -       -       -       -
>>>>>>12       -       -       S       -       -       -
>>>>>>13       S       -       -       -       -       -
>>>>>>14       S       S       -       -       -       -
>>>>>>15       S       S       -       -       -       -
>>>>>>16       S       S       -       -       -       -
>>>>>>17       M       -       -       -       -       -
>>>>>>18       S       S       -       -       -       -
>>>>>>
>>>>>>Let me explain the data.  18 positions.  First 6 are endgames, second 6 are
>>>>>>sharp tactical lines, last 6 are ordinary middlegame positions with no tactical
>>>>>>shots.
>>>>>>
>>>>>>The columns should be obvious, 1 hash error every N probes into the table.
>>>>>>
>>>>>>The letters signify
>>>>>>
>>>>>>"S" means score changed from the score produced by no collisions.
>>>>>>
>>>>>>"M" means the best move changed, but when I searched it with a no-collision
>>>>>>search it was not significantly different.
>>>>>>
>>>>>>"G" means that the best move changed, and it was way worse than the correct
>>>>>>no-collision move, so that it is likely that this error would cause a change in
>>>>>>the game outcome.  IE the Ba3 move by Botvinnik is a +3 move, but a couple of
>>>>>>errors caused Crafty to totally miss the move and choose something that was
>>>>>>about one piece worse, likely changing the game outcome.
>>>>>>
>>>>>>This is not complete, but it gives the idea.  We could take 2-3 programs and run
>>>>>>the same test, and make this a composite table.  IE if any program changes the
>>>>>>score or move, we indicate that.  Or if all programs change the score or move,
>>>>>>we indicate that...
>>>>>>
>>>>>>Bob
>>>>>
>>>>>sounds like a good idea. i'm not 100% sure whether the engine's strength has
>>>>>zero influence though. e.g. omid's verified null-move pruning paper has the flaw
>>>>>(IMO) that it uses a weak engine to check whether verified nullmove is better
>>>>>than non-verified. as you said, it may depend on the search depth you reach etc.
>>>>>whether R=1 / R=2 / R=3 is best or not. i would assume the same to be true for
>>>>>other kinds of experiments, like verified nullmove; or adding MPC to an existing
>>>>>program like buro did with crafty. perhaps it works for one program, perhaps
>>>>>not; and writing a paper based on one single program's performance may not be
>>>>>such a great idea...
>>>>>perhaps with this experiment it's the same for all engines - i don't know. i'd
>>>>>certainly send you my data if you send me the test set - then again, perhaps you
>>>>>should rather use some stronger engines to do this :-)
>>>>>
>>>>>cheers
>>>>>  martin
>>>>
>>>>
>>>>When you talk about programming algorithms, I don't disagree with you at all.
>>>>But when we talk about how hash collisions influence a program's performance, I
>>>>suspect that the results are going to be fairly close.  Granted that for a
>>>>material-only evaluation, hash errors might well produce fewer score changes,
>>>>only bacause the score is very "coarse" in granularity.  But once you get to
>>>>centipawn scores, I'd expect that programs would behave somewhat similarly even
>>>>if not identically.  And that would actually be an interesting aspect for
>>>>comparison, in fact...
>>>>
>>>>Of course, both of you are welcome to run the positions, the only requirement is
>>>>that you have to hack your hash probe code a small bit so that you can run an
>>>>experiment that is compatible with what I have done...
>>>>
>>>>If you (or Dieter) are interesting in running the test, let me know.  The
>>>>positions are not "secret" at all, I sort of did what Dieter did, except I
>>>>started out with the idea of 1/3 endgames, 1/3 sharp tactical lines, 1/3 normal
>>>>positions, just to get a feel for what gets changed the most...
>>>>
>>>>I am using 18 positions so I can run the test many times with lots of different
>>>>error rates, hash table sizes, etc...
>>>
>>>hi bob,
>>>
>>>yes, i'd be interested. i guess i wouldn't have to change too much in my hash
>>>code; you're basically introducing errors at a specific rate, e.g. with a random
>>>function call which has a fixed probability of giving the error or with a
>>>counter which makes the error appear every N moves. i have 3 hash tables in my
>>>program, for pawns, for qsearch and for normal search. i take it it would only
>>>be the main hash table that you fudge with?
>>>
>>>cheers
>>>  martin
>>
>>
>>Yes.  I force an error every 10^N hash probes, and I am only doing this for the
>>transposition/refutation table, not pawn hashes and so forth.  I am varying N
>>from 1 to 7 and running the test set with hash=384M and 12M to see what the
>>difference is when the hash table is small vs large.
>>
>>If you want to run the test, send me an email, I will send you the 18 positions
>>I am using along with the specific details of how I am "breaking" the hash code
>>so that we can all do it in the same way to get comparable results.  I have two
>>other volunteers already, a third would be nice.  IE it would be pretty
>>convincing if we all get the same sorts of numbers (within reason) using four
>>different chess engines.
>
>hi bob,
>
>if you already have two other volunteers, then i'd rather not participate - i
>don't think my engine deserves to be in the same paper as crafty :-)
>i'd say 3 similar results are just as good as 4....
>
>>I guess the only issue here is that by not including a
>>"certain" program, our results will all be invalid, but I'm not going _that_ far
>>as I have enough grey hair already. :)
>
>....except of course if you included said program it would get much more
>significant :-)
>
>cheers
>  martin


I suppose I will just have to wait for the "the data was faked.  All four of the
programmers faked the data."  Or "the data is meaningless, it is from rude
amateur programs rather than a
professionally-written-by-the-worlds-greatest-programmer program that would
produce meaningful data.

Etc.

I can hardly wait.

:)

I still don't know what a "rude program" is of course.

:)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.