Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Opionion needed:Zobrist type 1 error

Author: Heiner Marxen

Date: 09:38:45 07/02/00

Go up one level in this thread


On July 02, 2000 at 06:04:50, TEERAPONG TOVIRAT wrote:

>>Yes, sure, you are right, and I think I do understand it quite well.
>>I just wanted to say, that the resulting approximate formula for the
>>error probability depends on N and H, and not on P (IMO).
>
>Thanks for your interest to my post. No one besides you seems to do so.

Huh.  Now I'm a bit puzzled, and just re-read the thread carefully.
I suspect strongly that we do not really disagree.  See below.

>Perhaps it's too difficult for me to express  the third factor (P/H ) in
>detail(English).

I perfectly do understand P/H, which is the ideal number of real positions
mapping to the same hash value.  Among these we can have a "type 1 error".
I hope you will agree with this description.

May be you mean something different with "the third factor":  how far
the real hash function is away from the ideal condition.  When a hash
function does distribute the P positions very unevenly into the H hash
values, this has a negative effect, of course.

Is that your "third factor"?
If so, I apologize for misunderstanding.

Anyhow, let me further comment what you write...

>Let me try. Do you notice N and H are relatively constant or controlled
>by limitation of RAM ?

N is the size of our problem (the number of different position occurring
during a search, which we want to remember).
So we cannot change this without re-defining the problem itself.  Ok.

H is the number of different hash function values.  This is not the
same as the actual number of hash table entries.  E.g. we can have
a 32 bit wide hash function (H = 2^32) but only have 4M hash table
entries (2^22).  While the amount of RAM clearly limits the number
of hash table entries we can implement, it does not really limit
the width of our hash function.  If we like, we may implement a 64-bit
hash function, or an 80-bit hash function etc.

IMO, the choice of a good H is about the only thing where we really can
influence the probability of type 1 errors... except for the quality of
the hash function (which I now guess is your "third factor").

>                       In my hypothesis I base on assumption that every
>hash value has approximately equal number of positions,but I think in
>the actual situation is different.

Aha!

>                                   I don't know about the real distribution.

It does entirely depend upon the exact implementation of the hash function.
A Zobrist hash function is based on a pool of "random" numbers, which
are different for many implementations, and can make all the difference
between a good and a bad quality.

You will have to measure the "real distribution" for every hash function.
The 32-bit Zobrist hash function in Chest has a type 1 error rate of
1/1000 or less, which is quite within the theoretically expected range.
Since Chest must store and compare the complete key anyhow, that is good
enough for me (Chest does not tolerate even type 1 errors).

>I know only  a lot of people here say " not all random numbers are suitable
>for hash value  some of  them leading to clash so often than others" .

Yes, there are bad choices possible.

>So, the third factor should be the real number of positions that have the
>same hash value not just by approximation.

AHA!
I am reading it the third time, now, and it becomes clearer now.  :-)
(May be that coffee helped also :-)

>                                           And now if my hypothesis
>is correct.

Just to be clear: IMO your hypothesis, analysis and conclusion are correct.

>            Will it be useful? As I said above we cannot control N,H.
>I think we can detect a good series of random numbers by using these facts.

Now we are straight into the quality of the random numbers, right?

>In last few days I spent hours in experiment about this. I test my checkers
>program
>with 32 bit hash table. The results are quite the same as I expect.

Hey, that is great!

>At the early stage of game the incidence of  the type 1 error is so small.
>As the number of positions occupied on the table increase the incidence
>also increases until the table is fully occupied the incidence tend to be
>stable at some constant.

So far ok.

>                         My current problem is my incidence of error is
>higher than I expect. I got 0.5% instead of 1/4000. Do you think I can blame
>it on my random integers?  or Do you have any idea to generate a good series
>of random number?

Hmmm, that is sort of a problem.
Just to be sure: are you sure that you measure type 1 errors (different
position with same hash value) and not type 2 errors (different hash value
in the same hash entry)?

As I mentioned already, the collection of random numbers _can_ be bad,
and result in such an uneven distribution, that it basically works as
if it had fewer bits.  That can make such a difference.

I have generated my random numbers with the Unix library function "lrand48()".
I am not sure how others do it, but you may have a look into the Crafty
sources, may be the generation method is documented there.

A quick check one can perform is to make sure that the random numbers
pairwise have a "good" hamming distance (hamming distance = number of bits
with different value = popcount(x XOR y)).  Small hamming distances are
considered bad.  All too large ones are also considered bad, I believe.
For 32-bit numbers a hamming distance of about 16 would be best.
I believe this has been discussed in this forum some time ago.
May be you can find that in the archives.

Of course, there are always the standard methods for testing the quality
of pseudo random number generators (see e.g. Knuth TAoCP Vol 2).

>I'm sorry it's quite a long post.

Well... my answer is even longer :-)
We are both very on-topic, so I don't see a problem with the length.  :-)

>Thanks,
>Teerapong

I hope that the above has cleared up things.

Cheers,
Heiner



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.