Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: a faster neural-network activation function

Author: Landon Rabern

Date: 09:40:11 06/06/01

Go up one level in this thread


On June 06, 2001 at 00:13:59, Jim Bell wrote:

>On June 05, 2001 at 12:56:08, Landon Rabern wrote:
>
>>On June 05, 2001 at 08:21:16, Jim Bell wrote:
>>
>>>On June 04, 2001 at 19:00:55, Landon Rabern wrote:
>>>
>>>[SNIP]
>>>>
>>>>I have done some testing will a neural network evaluation in my program for my
>>>>independent study.  The biggest problem I ran into was the slowness of
>>>>calculating all the sigmoids(I actually used tanh(NET)).  It drastically cuts
>>>>down the nps and gets spanked by my handcrafted eval.  I got moderate results
>>>>playing with set ply depths no set time controls, but that isn't saying much.
>>>>
>>>>Regards,
>>>>
>>>>Landon W. Rabern
>>>
>>>In case you are still interested, you might want to consider what I assume is a
>>>faster activation function: x/(1.0+|x|), where x is the total weighted input to
>>>a node. I read about it in a paper titled "A Better Activation Function for
>>>Artificial Neural Networks", by D.L. Elliott.  I found a link to the paper (in
>>>PDF format) at:
>>>
>>>   "http://www.isr.umd.edu/TechReports/ISR/1993/TR_93-8/TR_93-8.phtml"
>>>
>>>I should warn you that I am certainly no expert when it comes to neural
>>>networks, and I haven't seen this particular activation function used elsewhere,
>>>but it shouldn't be too difficult to replace the tanh(x),
>>>and see what happens. (Of course, you would also have to change the
>>>derivative function as well!)
>>>
>>>Jim
>>
>>Interesting, I will have to try this.  The curve is not as smooth as the tanh,
>>but unlike the standard 1/(1+e^-x) it does output on -1,1.  The derivative will
>>be something like 1/(1+x)^2 but then need to take into accoutn the absolute
>>value.  I don't see a way off hand to use the original activation function to
>>produce the derivate quickly, but there must be a way.
>>
>>Regards,
>>
>>Landon W. Rabern
>
>As I recall, I tried a little experiment a couple of years ago in which I
>simulated a simple 3-layer feedforward neural network, using the standard
>y=1/(1+e^-x) squashing function, and then used the back-propagation algorithm to
>teach the network some simple input/output relationships. Then, I tried
>replacing the squashing function with y=x/(1+|x|), and instead of multiplying
>something (I don't remember what) by y(1-y), I multiplied it by (1-|y|)^2,
>because Elliott's paper has something like:
>
>y = 1/(1+e^-x), y' = (e^-x)/(1+e^-x)^2 = y(1-y)
>y = x/(1+|x|) , y' = 1/(1+|x|)^2 = (1-|y|)^2
>
>If memory serves me, the modified program also worked correctly, but I don't
>remember how much faster (or perhaps slower??) the new program ran. I soon
>thereafter deleted the code and I haven't done anything since with neural
>networks.
>
>Jim

OK, yes y'=(1-|y|)^2 works, cool.  I will try it when I get a chance, but no
time now it is crunch time at work getting some products out, so working 80+
hour weeks,they really like to work their interns :)

Regards,

Landon W. Rabern




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.