Author: martin fierz
Date: 04:11:21 02/27/04
Go up one level in this thread
On February 26, 2004 at 13:22:29, Robert Hyatt wrote: >On February 26, 2004 at 04:25:57, martin fierz wrote: > >>On February 25, 2004 at 12:39:29, Robert Hyatt wrote: >> >>>On February 25, 2004 at 12:13:34, martin fierz wrote: >>> >>>>On February 25, 2004 at 11:43:07, Robert Hyatt wrote: >>>> >>>>>On February 25, 2004 at 11:25:50, martin fierz wrote: >>>>> >>>>>>On February 25, 2004 at 10:58:36, Robert Hyatt wrote: >>>>>> >>>>>>>On February 25, 2004 at 09:13:43, martin fierz wrote: >>>>>>> >>>>>>>>On February 25, 2004 at 07:02:11, Dieter Buerssner wrote: >>>>>>>> >>>>>>>>>On February 25, 2004 at 05:56:16, martin fierz wrote: >>>>>>>>> >>>>>>>>>>it won't pop *my* eyes. i once reduced hash key sizes in my checkers program >>>>>>>>>>beyond all sensible settings, because there was a discussion here about whether >>>>>>>>>>you really need 64-bit keys. in my checkers program, i have 64 bit keys, but >>>>>>>>>>effectively it's only using about 52 bits. i have about a 20 bit part which is >>>>>>>>>>used for the hashindex with %, and of the remaining 44 bits i store only 32 as a >>>>>>>>>>check. i reduced those 32 down to about 8 (!!) bits and in 100 test positions >>>>>>>>>>only saw one different move played IIRC. ridiculous, i must have lots of >>>>>>>>>>collisions there. unfortunately, i didn't count the collision number, or write >>>>>>>>>>down the results - but i know what you're talking about! >>>>>>>>> >>>>>>>>>Almost the same experiment with my chess engine (inluding many details, like the >>>>>>>>>effective number of bits used, and going down to 8 bits only): >>>>>>>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=190318 >>>>>>>>> >>>>>>>>>Regards, >>>>>>>>>Dieter >>>>>>>> >>>>>>>>hi dieter, >>>>>>>> >>>>>>>>i had forgotten about your post on this, but now i remember it. very similar to >>>>>>>>my observations, and if only we had written our observations up a bit more >>>>>>>>seriously we could have written the paper that bob is publishing now ;-) >>>>>>>> >>>>>>>>cheers >>>>>>>> martin >>>>>>> >>>>>>> >>>>>>>Hey, I'm easy to get along with here. :) >>>>>>> >>>>>>>I have already asked one other person to do some similar testing. I'd be happy >>>>>>>to tell you both what I have done, and have you run similar tests, and join me >>>>>>>as authors on this paper. >>>>>>> >>>>>>>I am doing the test slightly different, as rather than a specific number of >>>>>>>signature bits, I am forcing a particular error rate (ie one error every N >>>>>>>nodes) with the idea being that I should be able to choose N in 1 error every N >>>>>>>nodes such that the score never changes, or the score changes or not the best >>>>>>>move, or the best move changes but it is not a bad change, or the best move >>>>>>>changes and it probably changes the game outcome. >>>>>>> >>>>>>>If either/both are interested, email me and I can send you a draft, which >>>>>>>explains how I am testing, and includes the test positions I am using. I have >>>>>>>some endgame positions (ie like fine 70), some sharp tactical positions like the >>>>>>>Ba3 Botvinnik-Capablanca move, and some plain middlegame positions from games >>>>>>>Crafty played on ICC. >>>>>>> >>>>>>>Let me know if you are interested... >>>>>> >>>>>>hi bob, >>>>>> >>>>>>this wasn't intended as criticism :-) >>>>>>you are a computer scientist, and i am not; it is your job to write this sort of >>>>>>paper - mine would be to write papers about physics... >>>>>>anyway, i have nothing to contribute chess-wise: my program is 200-300 rating >>>>>>points weaker than crafty, and i don't believe in writing academic papers about >>>>>>"toy programs". as you recently pointed out to me, you do some things different >>>>>>now that you search deeper (no more pin detection, higher R for nullmove). >>>>>>for example, i could write a paper about how R=1 is better than R=2 for my toy >>>>>>program, which is completely irrelevant in general, because for good programs >>>>>>it's clear that R=2 is better than R=1. so the only thing i could contribute is >>>>>>a similar experiment for checkers, where my program is much more advanced than >>>>>>my chess program. but i doubt that that would really be interesting either :-) >>>>>> >>>>>>cheers >>>>>> martin >>>>> >>>>>Here you are dead wrong. >>>>>The "quality" of the engine's play is not a factor, >>>>>what is interesting is how hash collisions affect _any_ program at all. IE one >>>>>way to present this data is in a chart like this: >>>>> >>>>> ---------one error every N probes--------- >>>>>pos 10 100 1000 10000 100000 1000000 >>>>> 1 - - - - - - >>>>> 2 S S S - - - >>>>> 3 S - - - - - >>>>> 4 - - - - - - >>>>> 5 S S S - - - >>>>> 6 S S S S - - >>>>> 7 - - - - - - >>>>> 8 G S - - - - >>>>> 9 S - - - - - >>>>>10 S S - - - - >>>>>11 S - - - - - >>>>>12 - - S - - - >>>>>13 S - - - - - >>>>>14 S S - - - - >>>>>15 S S - - - - >>>>>16 S S - - - - >>>>>17 M - - - - - >>>>>18 S S - - - - >>>>> >>>>>Let me explain the data. 18 positions. First 6 are endgames, second 6 are >>>>>sharp tactical lines, last 6 are ordinary middlegame positions with no tactical >>>>>shots. >>>>> >>>>>The columns should be obvious, 1 hash error every N probes into the table. >>>>> >>>>>The letters signify >>>>> >>>>>"S" means score changed from the score produced by no collisions. >>>>> >>>>>"M" means the best move changed, but when I searched it with a no-collision >>>>>search it was not significantly different. >>>>> >>>>>"G" means that the best move changed, and it was way worse than the correct >>>>>no-collision move, so that it is likely that this error would cause a change in >>>>>the game outcome. IE the Ba3 move by Botvinnik is a +3 move, but a couple of >>>>>errors caused Crafty to totally miss the move and choose something that was >>>>>about one piece worse, likely changing the game outcome. >>>>> >>>>>This is not complete, but it gives the idea. We could take 2-3 programs and run >>>>>the same test, and make this a composite table. IE if any program changes the >>>>>score or move, we indicate that. Or if all programs change the score or move, >>>>>we indicate that... >>>>> >>>>>Bob >>>> >>>>sounds like a good idea. i'm not 100% sure whether the engine's strength has >>>>zero influence though. e.g. omid's verified null-move pruning paper has the flaw >>>>(IMO) that it uses a weak engine to check whether verified nullmove is better >>>>than non-verified. as you said, it may depend on the search depth you reach etc. >>>>whether R=1 / R=2 / R=3 is best or not. i would assume the same to be true for >>>>other kinds of experiments, like verified nullmove; or adding MPC to an existing >>>>program like buro did with crafty. perhaps it works for one program, perhaps >>>>not; and writing a paper based on one single program's performance may not be >>>>such a great idea... >>>>perhaps with this experiment it's the same for all engines - i don't know. i'd >>>>certainly send you my data if you send me the test set - then again, perhaps you >>>>should rather use some stronger engines to do this :-) >>>> >>>>cheers >>>> martin >>> >>> >>>When you talk about programming algorithms, I don't disagree with you at all. >>>But when we talk about how hash collisions influence a program's performance, I >>>suspect that the results are going to be fairly close. Granted that for a >>>material-only evaluation, hash errors might well produce fewer score changes, >>>only bacause the score is very "coarse" in granularity. But once you get to >>>centipawn scores, I'd expect that programs would behave somewhat similarly even >>>if not identically. And that would actually be an interesting aspect for >>>comparison, in fact... >>> >>>Of course, both of you are welcome to run the positions, the only requirement is >>>that you have to hack your hash probe code a small bit so that you can run an >>>experiment that is compatible with what I have done... >>> >>>If you (or Dieter) are interesting in running the test, let me know. The >>>positions are not "secret" at all, I sort of did what Dieter did, except I >>>started out with the idea of 1/3 endgames, 1/3 sharp tactical lines, 1/3 normal >>>positions, just to get a feel for what gets changed the most... >>> >>>I am using 18 positions so I can run the test many times with lots of different >>>error rates, hash table sizes, etc... >> >>hi bob, >> >>yes, i'd be interested. i guess i wouldn't have to change too much in my hash >>code; you're basically introducing errors at a specific rate, e.g. with a random >>function call which has a fixed probability of giving the error or with a >>counter which makes the error appear every N moves. i have 3 hash tables in my >>program, for pawns, for qsearch and for normal search. i take it it would only >>be the main hash table that you fudge with? >> >>cheers >> martin > > >Yes. I force an error every 10^N hash probes, and I am only doing this for the >transposition/refutation table, not pawn hashes and so forth. I am varying N >from 1 to 7 and running the test set with hash=384M and 12M to see what the >difference is when the hash table is small vs large. > >If you want to run the test, send me an email, I will send you the 18 positions >I am using along with the specific details of how I am "breaking" the hash code >so that we can all do it in the same way to get comparable results. I have two >other volunteers already, a third would be nice. IE it would be pretty >convincing if we all get the same sorts of numbers (within reason) using four >different chess engines. hi bob, if you already have two other volunteers, then i'd rather not participate - i don't think my engine deserves to be in the same paper as crafty :-) i'd say 3 similar results are just as good as 4.... >I guess the only issue here is that by not including a >"certain" program, our results will all be invalid, but I'm not going _that_ far >as I have enough grey hair already. :) ....except of course if you included said program it would get much more significant :-) cheers martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.