Author: Christophe Theron
Date: 19:58:51 07/05/03
Go up one level in this thread
On July 05, 2003 at 13:55:12, Vincent Diepeveen wrote: >On July 05, 2003 at 00:25:24, Christophe Theron wrote: > >>On July 04, 2003 at 23:56:34, Vincent Diepeveen wrote: >> >>>On July 04, 2003 at 11:32:03, Christophe Theron wrote: >>> >>>>On July 03, 2003 at 15:44:44, Landon Rabern wrote: >>>> >>>>>On July 03, 2003 at 03:22:15, Christophe Theron wrote: >>>>> >>>>>>On July 02, 2003 at 13:13:43, Landon Rabern wrote: >>>>>> >>>>>>>On July 02, 2003 at 02:18:48, Dann Corbit wrote: >>>>>>> >>>>>>>>On July 02, 2003 at 02:03:20, Landon Rabern wrote: >>>>>>>>[snip] >>>>>>>>>I made an attempt to use a NN for determining extensions and reductions. It was >>>>>>>>>evolved using a GA, kinda worked, but I ran out of time. to work on it at the >>>>>>>>>end of school and don't have my computer anymore. The problem is that the NN is >>>>>>>>>SLOW, even using x/(1+|x|) for activation instead of tanh(x). >>>>>>>> >>>>>>>>Precompute a hyperbolic tangent table and store it in an array. Speeds it up a >>>>>>>>lot. >>>>>>> >>>>>>>Well, x/(1+|x|) is as fast or faster than a large table lookup. The slowdown >>>>>>>was from all the looping necessary for the feedforward. >>>>>>> >>>>>>>Landon >>>>>> >>>>>> >>>>>> >>>>>>A stupid question maybe, but I'm very interested by this stuff: >>>>>> >>>>>>Do you really need a lot of accuracy for the "activation function"? Would it be >>>>>>possible to consider a 256 values output for example? >>>>>> >>>>>>Would the lack of accuracy hurt? >>>>>> >>>>>>I'm not sure, but it seems to me that biological neurons do not need a lot of >>>>>>accuracy in their output, and even worse: they are noisy. So I wonder if low >>>>>>accuracy would be enough. >>>>>> >>>>> >>>>>There are neural net models that work with only binary output. If the total >>>>>input value exceeds some threshhold then you get a 1 otherwise a 0. The problem >>>>>is with training them by back prop. But in this case I was using a Genetic Alg, >>>>>so no back prop at all - so no problem. I might work, but I don't see the >>>>>benefit - were you thinking for speed? The x/(1+|x|) is pretty fast to >>>>>calculate, but perhaps the binary (or other discrete) would be faster. >>>>>Something to try. >>>>> >>>>>Landon >>>> >>>> >>>> >>>>Yes, what I had in mind was optimization by using integer arithmetic only. >>>> >>>>If the output is always on 8 bits, the sigma(W*I) (weight*input) can be computed >>>>on 32 bits (each W*I will have at most 16 bits). >>>> >>>>Actually sigma(W*I) will have no more than 20 bits if each neuron has at most 16 >>>>inputs. 32 bits allows for 65536 input per neuron. >>>> >>>>This -maybe- allows for a fast table lookup of the atan function that I see used >>>>often in ANN. I think it can be a little faster than x/(1+|x|) computed using >>>>floating point arithmetic. Also, and this is even more important, the sigma(W*I) >>>>would use integer arithmetic instead of floating point. >>>> >>>>Maybe I should just do a Google search for this, I'm sure I'm not the first one >>>>to think about this optimzation. >>> >>>I'm actually sure you are the first to find this optimization! >>> >>>The reason is that the average AI scientist never is doing many practical >>>experiments with ANNs. Basically practical researchers outside ANNs are doing >>>somteimes a few experiments like you and i. Further from that very tiny >>>percentile researchers that sometimes do ANN experiments the average solution >>>they come up with when they need to calculate it faster is either ask some >>>system time of a supercomputer, or more likely fill a sporthal of their own >>>university department with PC's, then they lay down a few network cables under >>>the floor and they run a few carefully selected benchmarks showing their beowulf >>>cluster really is a great thing to have. >>> >>>When a year or 10 ago some dude was looking around for funding for either a >>>supercomputer or to buy hardware to speedup his neural networking software which >>>was using quite some neurons, then i have translated his quickbasic program into >>>C and optimized its speed by writing out stuff and finding clever loops within >>>it that lossless speeded it up. >>> >>>In total i managed to speedup his software around a factor 1000 after 7 days of >>>hard work (remember i started with 100KB quickbasic code and ended up with about >>>20KB C code. note that the quickbasic used was the compilerversion not an >>>interpreter). >>> >>>I was very amazed then that he didn't like me doing that, because i had thought >>>he just wanted his software to run faster. When you grow up you slowly learn how >>>people work and in the AI world this is pretty easy to predict. >>> >>>So having that in mind i am sure that you are one of the first to publicly speak >>>out and say that you can speedup things a lot! >>> >>>Last tuesday i was at a supercomputing conference and of course for hours i have >>>talked with many researchers and professors. I am still very proud that against >>>no one after talking what they did on the computer i told to that i would love >>>to take a look at their code in order to give them a few free tips to speedup >>>their software quite some times. With some of them i sure knew i could. >>> >>>Some still haven't found out the difference between latency and bandwidth and >>>what is out of order (the R14k processors) and that the new itanium2 processors >>>here (416 of them clocked 1.3Ghz and 832GB ram) which are way faster for >>>floating point and way slower for latency than the old R14ks. >>> >>>Possible the slow latency is partly because of the interesting idea to run >>>redhat 7.2 with the unmodified linux kernel 2.4.19 at it. Let's blame the >>>economic times that causes this Dutch habit to save money :) >>> >>>A good example of several different research projects there which i can speedup >>>with just 5 minutes of work is that several projects lose like all of their >>>system time to a RNG as for each number in the calculation matrix they take a >>>number from the RNG. >>> >>>They compile of course with option -O2. >>> >>>Their RNG is some slow thing that runs in 32 bits. >>> >>>However for my latency test i did a very small effort to speedup an already fast >>>RNG a little. Replacing their RNG by this one would speedup their field >>>calculations quite a lot. >>> >>>The matrix calculations they do then are pretty fast by the way as an efficient >>>library is doing them for them. >>> >>>However they could also speed the entire program up incredibly by using 64 bits >>>integer values instead of floating point. >>> >>>Remember both are 64 bit processors. Both the R14k (8MB L2 cache) and the >>>I2-Madisons which are 3MB L2 cache. >>> >>>The research these guys do then still is very good research. >>> >>>No i won't mention a single name of the guys. They are cool guys. >>> >>>Best regards, >>>Vincent >> >> >> >>But I have seen some commercial ANN applications out there. Surely these have >>optimized integer arithmetic, because there must be an economical incentive to >>do so. >> > >Of course i didn't try all of them. Just a few. but the few i tried were not >doing very well and dead slow in fact. > >Because just consider who buys such software, then you already know that >features and logics used are for them more important than speed of the logics. > >I have to admit that my experiments i did it with dead slow code too, because >the networks i used didn't have 10000 neurons like for example Dan Thies used to >tune a chess evaluation (using a normal chess program and letting the ANN learn >the evaluation including material values etc). For its time his ANN was very >expensive and very fast (could do 10000 evaluations a seconds at the 10000 >neuron network). That's 1 cycle per neuron on a 100MHz computer, or 10 cycles per neuron on a 1GHz computer. >It still has to beat diep version 1.2 though (diep version 1.2 could at the >tested hardware not put mate with KRK if i remember well). Maybe using it for the evaluation is not the most efficient use of a neural network in a chess program. It seems that the way human players manage to search the tree is vastly underestimated. Christophe
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.