Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: a faster neural-network activation function

Author: Jim Bell

Date: 21:13:59 06/05/01

Go up one level in this thread


On June 05, 2001 at 12:56:08, Landon Rabern wrote:

>On June 05, 2001 at 08:21:16, Jim Bell wrote:
>
>>On June 04, 2001 at 19:00:55, Landon Rabern wrote:
>>
>>[SNIP]
>>>
>>>I have done some testing will a neural network evaluation in my program for my
>>>independent study.  The biggest problem I ran into was the slowness of
>>>calculating all the sigmoids(I actually used tanh(NET)).  It drastically cuts
>>>down the nps and gets spanked by my handcrafted eval.  I got moderate results
>>>playing with set ply depths no set time controls, but that isn't saying much.
>>>
>>>Regards,
>>>
>>>Landon W. Rabern
>>
>>In case you are still interested, you might want to consider what I assume is a
>>faster activation function: x/(1.0+|x|), where x is the total weighted input to
>>a node. I read about it in a paper titled "A Better Activation Function for
>>Artificial Neural Networks", by D.L. Elliott.  I found a link to the paper (in
>>PDF format) at:
>>
>>   "http://www.isr.umd.edu/TechReports/ISR/1993/TR_93-8/TR_93-8.phtml"
>>
>>I should warn you that I am certainly no expert when it comes to neural
>>networks, and I haven't seen this particular activation function used elsewhere,
>>but it shouldn't be too difficult to replace the tanh(x),
>>and see what happens. (Of course, you would also have to change the
>>derivative function as well!)
>>
>>Jim
>
>Interesting, I will have to try this.  The curve is not as smooth as the tanh,
>but unlike the standard 1/(1+e^-x) it does output on -1,1.  The derivative will
>be something like 1/(1+x)^2 but then need to take into accoutn the absolute
>value.  I don't see a way off hand to use the original activation function to
>produce the derivate quickly, but there must be a way.
>
>Regards,
>
>Landon W. Rabern

As I recall, I tried a little experiment a couple of years ago in which I
simulated a simple 3-layer feedforward neural network, using the standard
y=1/(1+e^-x) squashing function, and then used the back-propagation algorithm to
teach the network some simple input/output relationships. Then, I tried
replacing the squashing function with y=x/(1+|x|), and instead of multiplying
something (I don't remember what) by y(1-y), I multiplied it by (1-|y|)^2,
because Elliott's paper has something like:

y = 1/(1+e^-x), y' = (e^-x)/(1+e^-x)^2 = y(1-y)
y = x/(1+|x|) , y' = 1/(1+|x|)^2 = (1-|y|)^2

If memory serves me, the modified program also worked correctly, but I don't
remember how much faster (or perhaps slower??) the new program ran. I soon
thereafter deleted the code and I haven't done anything since with neural
networks.

Jim



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.