# Computer Chess Club Archives

## Messages

### Subject: Re: chess and neural networks

Author: Vincent Diepeveen

Date: 15:30:25 07/06/03

Go up one level in this thread

```On July 06, 2003 at 16:35:23, Uri Blass wrote:

>On July 06, 2003 at 15:51:49, Vincent Diepeveen wrote:
>
>>On July 05, 2003 at 22:58:51, Christophe Theron wrote:
>>
>>>On July 05, 2003 at 13:55:12, Vincent Diepeveen wrote:
>>>
>>>>On July 05, 2003 at 00:25:24, Christophe Theron wrote:
>>>>
>>>>>On July 04, 2003 at 23:56:34, Vincent Diepeveen wrote:
>>>>>
>>>>>>On July 04, 2003 at 11:32:03, Christophe Theron wrote:
>>>>>>
>>>>>>>On July 03, 2003 at 15:44:44, Landon Rabern wrote:
>>>>>>>
>>>>>>>>On July 03, 2003 at 03:22:15, Christophe Theron wrote:
>>>>>>>>
>>>>>>>>>On July 02, 2003 at 13:13:43, Landon Rabern wrote:
>>>>>>>>>
>>>>>>>>>>On July 02, 2003 at 02:18:48, Dann Corbit wrote:
>>>>>>>>>>
>>>>>>>>>>>On July 02, 2003 at 02:03:20, Landon Rabern wrote:
>>>>>>>>>>>[snip]
>>>>>>>>>>>>I made an attempt to use a NN for determining extensions and reductions.  It was
>>>>>>>>>>>>evolved using a GA, kinda worked, but I ran out of time. to work on it at the
>>>>>>>>>>>>end of school and don't have my computer anymore. The problem is that the NN is
>>>>>>>>>>>>SLOW, even using x/(1+|x|) for activation instead of tanh(x).
>>>>>>>>>>>
>>>>>>>>>>>Precompute a hyperbolic tangent table and store it in an array.  Speeds it up a
>>>>>>>>>>>lot.
>>>>>>>>>>
>>>>>>>>>>Well, x/(1+|x|) is as fast or faster than a large table lookup.  The slowdown
>>>>>>>>>>was from all the looping necessary for the feedforward.
>>>>>>>>>>
>>>>>>>>>>Landon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>A stupid question maybe, but I'm very interested by this stuff:
>>>>>>>>>
>>>>>>>>>Do you really need a lot of accuracy for the "activation function"? Would it be
>>>>>>>>>possible to consider a 256 values output for example?
>>>>>>>>>
>>>>>>>>>Would the lack of accuracy hurt?
>>>>>>>>>
>>>>>>>>>I'm not sure, but it seems to me that biological neurons do not need a lot of
>>>>>>>>>accuracy in their output, and even worse: they are noisy. So I wonder if low
>>>>>>>>>accuracy would be enough.
>>>>>>>>>
>>>>>>>>
>>>>>>>>There are neural net models that work with only binary output.  If the total
>>>>>>>>input value exceeds some threshhold then you get a 1 otherwise a 0.  The problem
>>>>>>>>is with training them by back prop.  But in this case I was using a Genetic Alg,
>>>>>>>>so no back prop at all - so no problem.  I might work, but I don't see the
>>>>>>>>benefit - were you thinking for speed?  The x/(1+|x|) is pretty fast to
>>>>>>>>calculate, but perhaps the binary (or other discrete) would be faster.
>>>>>>>>Something to try.
>>>>>>>>
>>>>>>>>Landon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Yes, what I had in mind was optimization by using integer arithmetic only.
>>>>>>>
>>>>>>>If the output is always on 8 bits, the sigma(W*I) (weight*input) can be computed
>>>>>>>on 32 bits (each W*I will have at most 16 bits).
>>>>>>>
>>>>>>>Actually sigma(W*I) will have no more than 20 bits if each neuron has at most 16
>>>>>>>inputs. 32 bits allows for 65536 input per neuron.
>>>>>>>
>>>>>>>This -maybe- allows for a fast table lookup of the atan function that I see used
>>>>>>>often in ANN. I think it can be a little faster than x/(1+|x|) computed using
>>>>>>>floating point arithmetic. Also, and this is even more important, the sigma(W*I)
>>>>>>>would use integer arithmetic instead of floating point.
>>>>>>>
>>>>>>>Maybe I should just do a Google search for this, I'm sure I'm not the first one
>>>>>>
>>>>>>I'm actually sure you are the first to find this optimization!
>>>>>>
>>>>>>The reason is that the average AI scientist never is doing many practical
>>>>>>experiments with ANNs. Basically practical researchers outside ANNs are doing
>>>>>>somteimes a few experiments like you and i. Further from that very tiny
>>>>>>percentile researchers that sometimes do ANN experiments the average solution
>>>>>>they come up with when they need to calculate it faster is either ask some
>>>>>>system time of a supercomputer, or more likely fill a sporthal of their own
>>>>>>university department with PC's, then they lay down a few network cables under
>>>>>>the floor and they run a few carefully selected benchmarks showing their beowulf
>>>>>>cluster really is a great thing to have.
>>>>>>
>>>>>>When a year or 10 ago some dude was looking around for funding for either a
>>>>>>supercomputer or to buy hardware to speedup his neural networking software which
>>>>>>was using quite some neurons, then i have translated his quickbasic program into
>>>>>>C and optimized its speed by writing out stuff and finding clever loops within
>>>>>>it that lossless speeded it up.
>>>>>>
>>>>>>In total i managed to speedup his software around a factor 1000 after 7 days of
>>>>>>hard work (remember i started with 100KB quickbasic code and ended up with about
>>>>>>20KB C code. note that the quickbasic used was the compilerversion not an
>>>>>>interpreter).
>>>>>>
>>>>>>I was very amazed then that he didn't like me doing that, because i had thought
>>>>>>he just wanted his software to run faster. When you grow up you slowly learn how
>>>>>>people work and in the AI world this is pretty easy to predict.
>>>>>>
>>>>>>So having that in mind i am sure that you are one of the first to publicly speak
>>>>>>out and say that you can speedup things a lot!
>>>>>>
>>>>>>Last tuesday i was at a supercomputing conference and of course for hours i have
>>>>>>talked with many researchers and professors. I am still very proud that against
>>>>>>no one after talking what they did on the computer i told to that i would love
>>>>>>to take a look at their code in order to give them a few free tips to speedup
>>>>>>their software quite some times. With some of them i sure knew i could.
>>>>>>
>>>>>>Some still haven't found out the difference between latency and bandwidth and
>>>>>>what is out of order (the R14k processors) and that the new itanium2 processors
>>>>>>here (416 of them clocked 1.3Ghz and 832GB ram) which are way faster for
>>>>>>floating point and way slower for latency than the old R14ks.
>>>>>>
>>>>>>Possible the slow latency is partly because of the interesting idea to run
>>>>>>redhat 7.2 with the unmodified linux kernel 2.4.19 at it. Let's blame the
>>>>>>economic times that causes this Dutch habit to save money :)
>>>>>>
>>>>>>A good example of several different research projects there which i can speedup
>>>>>>with just 5 minutes of work is that several projects lose like all of their
>>>>>>system time to a RNG as for each number in the calculation matrix they take a
>>>>>>number from the RNG.
>>>>>>
>>>>>>They compile of course with option -O2.
>>>>>>
>>>>>>Their RNG is some slow thing that runs in 32 bits.
>>>>>>
>>>>>>However for my latency test i did a very small effort to speedup an already fast
>>>>>>RNG a little. Replacing their RNG by this one would speedup their field
>>>>>>calculations quite a lot.
>>>>>>
>>>>>>The matrix calculations they do then are pretty fast by the way as an efficient
>>>>>>library is doing them for them.
>>>>>>
>>>>>>However they could also speed the entire program up incredibly by using 64 bits
>>>>>>integer values instead of floating point.
>>>>>>
>>>>>>Remember both are 64 bit processors. Both the R14k (8MB L2 cache) and the
>>>>>>I2-Madisons which are 3MB L2 cache.
>>>>>>
>>>>>>The research these guys do then still is very good research.
>>>>>>
>>>>>>No i won't mention a single name of the guys. They are cool guys.
>>>>>>
>>>>>>Best regards,
>>>>>>Vincent
>>>>>
>>>>>
>>>>>
>>>>>But I have seen some commercial ANN applications out there. Surely these have
>>>>>optimized integer arithmetic, because there must be an economical incentive to
>>>>>do so.
>>>>>
>>>>
>>>>Of course i didn't try all of them. Just a few. but the few i tried were not
>>>>doing very well and dead slow in fact.
>>>>
>>>>Because just consider who buys such software, then you already know that
>>>>features and logics used are for them more important than speed of the logics.
>>>>
>>>>I have to admit that my experiments i did it with dead slow code too, because
>>>>the networks i used didn't have 10000 neurons like for example Dan Thies used to
>>>>tune a chess evaluation (using a normal chess program and letting the ANN learn
>>>>the evaluation including material values etc). For its time his ANN was very
>>>>expensive and very fast (could do 10000 evaluations a seconds at the 10000
>>>>neuron network).
>>>
>>>
>>>That's 1 cycle per neuron on a 100MHz computer, or 10 cycles per neuron on a
>>>1GHz computer.
>>
>>His network was very expensive. It was not in software trivially.
>>
>>You cannot say it is 1 cycle per neuron on a 100Mhz computer, because you cannot
>>simulate such networks even at a 2Ghz computer.
>>
>>For general ANN software that calculates them you need slow 'for loops' that
>>first index where to start optimizing then with that slow for loop they one by
>>one check all the inputs and resulting outputs. Then the outputs get stored in a
>>slow array again and the calculation process starts fresh.
>>
>>So using 10 cycles a neuron is pretty impossible for the commercial neural
>>networks.
>>
>>On the other hand it is very well parallellizable such networks, so sometimes
>>all you need is a big network with many computers and it for 90% is
>>parallellizable then if not higher.
>>
>>This of course only for big networks.
>>
>>>
>>>
>>>>It still has to beat diep version 1.2 though (diep version 1.2 could at the
>>>>tested hardware not put mate with KRK if i remember well).
>>>
>>>
>>>Maybe using it for the evaluation is not the most efficient use of a neural
>>>network in a chess program. It seems that the way human players manage to search
>>>the tree is vastly underestimated.
>>
>>For search an ANN is even less well suited. Humans make many mistakes in search.
>>A chessprogram with a human evaluation (even from a 2100 player) is directly an
>>unbeatable world champion of course.
>
>1)There is no definition for a chess program with a human evaluation at the
>level of 2100 player so the last sentence is meaningless.

There is no definition of an Uri posting so your posts have no official status.

>2)There are things that programs evaluate better than humans.

There is 1600 rated player on steroids who play like IMs. The proof for example
is the US open where some players win the first price with play i would swear is
above the U1600 or U1800 category they join in.

>Humans do not grasp all the board in 1/2 second and they may forget an important
>positional factor when they evaluate a position(for example they may not pay
>attention for the fact that a pawn is weak not because of lack of knowledge but
>because they do not grasp all the board).
>
>Computers do not have that problem.
>Humans have other advantages but the bottom line is that you cannot compare.

I am sure you will be never able to decide upon anything and that you won't even
accept that 1 + 1 = 2, because you'll conclude that it is possible to count in
the binary system and then it is 10.

Therefore i award you the Uri-posting price!

>Uri

```