Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: chess and neural networks

Author: Vincent Diepeveen

Date: 20:56:34 07/04/03

Go up one level in this thread


On July 04, 2003 at 11:32:03, Christophe Theron wrote:

>On July 03, 2003 at 15:44:44, Landon Rabern wrote:
>
>>On July 03, 2003 at 03:22:15, Christophe Theron wrote:
>>
>>>On July 02, 2003 at 13:13:43, Landon Rabern wrote:
>>>
>>>>On July 02, 2003 at 02:18:48, Dann Corbit wrote:
>>>>
>>>>>On July 02, 2003 at 02:03:20, Landon Rabern wrote:
>>>>>[snip]
>>>>>>I made an attempt to use a NN for determining extensions and reductions.  It was
>>>>>>evolved using a GA, kinda worked, but I ran out of time. to work on it at the
>>>>>>end of school and don't have my computer anymore. The problem is that the NN is
>>>>>>SLOW, even using x/(1+|x|) for activation instead of tanh(x).
>>>>>
>>>>>Precompute a hyperbolic tangent table and store it in an array.  Speeds it up a
>>>>>lot.
>>>>
>>>>Well, x/(1+|x|) is as fast or faster than a large table lookup.  The slowdown
>>>>was from all the looping necessary for the feedforward.
>>>>
>>>>Landon
>>>
>>>
>>>
>>>A stupid question maybe, but I'm very interested by this stuff:
>>>
>>>Do you really need a lot of accuracy for the "activation function"? Would it be
>>>possible to consider a 256 values output for example?
>>>
>>>Would the lack of accuracy hurt?
>>>
>>>I'm not sure, but it seems to me that biological neurons do not need a lot of
>>>accuracy in their output, and even worse: they are noisy. So I wonder if low
>>>accuracy would be enough.
>>>
>>
>>There are neural net models that work with only binary output.  If the total
>>input value exceeds some threshhold then you get a 1 otherwise a 0.  The problem
>>is with training them by back prop.  But in this case I was using a Genetic Alg,
>>so no back prop at all - so no problem.  I might work, but I don't see the
>>benefit - were you thinking for speed?  The x/(1+|x|) is pretty fast to
>>calculate, but perhaps the binary (or other discrete) would be faster.
>>Something to try.
>>
>>Landon
>
>
>
>Yes, what I had in mind was optimization by using integer arithmetic only.
>
>If the output is always on 8 bits, the sigma(W*I) (weight*input) can be computed
>on 32 bits (each W*I will have at most 16 bits).
>
>Actually sigma(W*I) will have no more than 20 bits if each neuron has at most 16
>inputs. 32 bits allows for 65536 input per neuron.
>
>This -maybe- allows for a fast table lookup of the atan function that I see used
>often in ANN. I think it can be a little faster than x/(1+|x|) computed using
>floating point arithmetic. Also, and this is even more important, the sigma(W*I)
>would use integer arithmetic instead of floating point.
>
>Maybe I should just do a Google search for this, I'm sure I'm not the first one
>to think about this optimzation.

I'm actually sure you are the first to find this optimization!

The reason is that the average AI scientist never is doing many practical
experiments with ANNs. Basically practical researchers outside ANNs are doing
somteimes a few experiments like you and i. Further from that very tiny
percentile researchers that sometimes do ANN experiments the average solution
they come up with when they need to calculate it faster is either ask some
system time of a supercomputer, or more likely fill a sporthal of their own
university department with PC's, then they lay down a few network cables under
the floor and they run a few carefully selected benchmarks showing their beowulf
cluster really is a great thing to have.

When a year or 10 ago some dude was looking around for funding for either a
supercomputer or to buy hardware to speedup his neural networking software which
was using quite some neurons, then i have translated his quickbasic program into
C and optimized its speed by writing out stuff and finding clever loops within
it that lossless speeded it up.

In total i managed to speedup his software around a factor 1000 after 7 days of
hard work (remember i started with 100KB quickbasic code and ended up with about
20KB C code. note that the quickbasic used was the compilerversion not an
interpreter).

I was very amazed then that he didn't like me doing that, because i had thought
he just wanted his software to run faster. When you grow up you slowly learn how
people work and in the AI world this is pretty easy to predict.

So having that in mind i am sure that you are one of the first to publicly speak
out and say that you can speedup things a lot!

Last tuesday i was at a supercomputing conference and of course for hours i have
talked with many researchers and professors. I am still very proud that against
no one after talking what they did on the computer i told to that i would love
to take a look at their code in order to give them a few free tips to speedup
their software quite some times. With some of them i sure knew i could.

Some still haven't found out the difference between latency and bandwidth and
what is out of order (the R14k processors) and that the new itanium2 processors
here (416 of them clocked 1.3Ghz and 832GB ram) which are way faster for
floating point and way slower for latency than the old R14ks.

Possible the slow latency is partly because of the interesting idea to run
redhat 7.2 with the unmodified linux kernel 2.4.19 at it. Let's blame the
economic times that causes this Dutch habit to save money :)

A good example of several different research projects there which i can speedup
with just 5 minutes of work is that several projects lose like all of their
system time to a RNG as for each number in the calculation matrix they take a
number from the RNG.

They compile of course with option -O2.

Their RNG is some slow thing that runs in 32 bits.

However for my latency test i did a very small effort to speedup an already fast
RNG a little. Replacing their RNG by this one would speedup their field
calculations quite a lot.

The matrix calculations they do then are pretty fast by the way as an efficient
library is doing them for them.

However they could also speed the entire program up incredibly by using 64 bits
integer values instead of floating point.

Remember both are 64 bit processors. Both the R14k (8MB L2 cache) and the
I2-Madisons which are 3MB L2 cache.

The research these guys do then still is very good research.

No i won't mention a single name of the guys. They are cool guys.

Best regards,
Vincent





>
>
>    Christophe



This page took 0.04 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.