Author: Uri Blass
Date: 21:25:49 07/05/03
Go up one level in this thread
On July 05, 2003 at 22:58:51, Christophe Theron wrote: >On July 05, 2003 at 13:55:12, Vincent Diepeveen wrote: > >>On July 05, 2003 at 00:25:24, Christophe Theron wrote: >> >>>On July 04, 2003 at 23:56:34, Vincent Diepeveen wrote: >>> >>>>On July 04, 2003 at 11:32:03, Christophe Theron wrote: >>>> >>>>>On July 03, 2003 at 15:44:44, Landon Rabern wrote: >>>>> >>>>>>On July 03, 2003 at 03:22:15, Christophe Theron wrote: >>>>>> >>>>>>>On July 02, 2003 at 13:13:43, Landon Rabern wrote: >>>>>>> >>>>>>>>On July 02, 2003 at 02:18:48, Dann Corbit wrote: >>>>>>>> >>>>>>>>>On July 02, 2003 at 02:03:20, Landon Rabern wrote: >>>>>>>>>[snip] >>>>>>>>>>I made an attempt to use a NN for determining extensions and reductions. It was >>>>>>>>>>evolved using a GA, kinda worked, but I ran out of time. to work on it at the >>>>>>>>>>end of school and don't have my computer anymore. The problem is that the NN is >>>>>>>>>>SLOW, even using x/(1+|x|) for activation instead of tanh(x). >>>>>>>>> >>>>>>>>>Precompute a hyperbolic tangent table and store it in an array. Speeds it up a >>>>>>>>>lot. >>>>>>>> >>>>>>>>Well, x/(1+|x|) is as fast or faster than a large table lookup. The slowdown >>>>>>>>was from all the looping necessary for the feedforward. >>>>>>>> >>>>>>>>Landon >>>>>>> >>>>>>> >>>>>>> >>>>>>>A stupid question maybe, but I'm very interested by this stuff: >>>>>>> >>>>>>>Do you really need a lot of accuracy for the "activation function"? Would it be >>>>>>>possible to consider a 256 values output for example? >>>>>>> >>>>>>>Would the lack of accuracy hurt? >>>>>>> >>>>>>>I'm not sure, but it seems to me that biological neurons do not need a lot of >>>>>>>accuracy in their output, and even worse: they are noisy. So I wonder if low >>>>>>>accuracy would be enough. >>>>>>> >>>>>> >>>>>>There are neural net models that work with only binary output. If the total >>>>>>input value exceeds some threshhold then you get a 1 otherwise a 0. The problem >>>>>>is with training them by back prop. But in this case I was using a Genetic Alg, >>>>>>so no back prop at all - so no problem. I might work, but I don't see the >>>>>>benefit - were you thinking for speed? The x/(1+|x|) is pretty fast to >>>>>>calculate, but perhaps the binary (or other discrete) would be faster. >>>>>>Something to try. >>>>>> >>>>>>Landon >>>>> >>>>> >>>>> >>>>>Yes, what I had in mind was optimization by using integer arithmetic only. >>>>> >>>>>If the output is always on 8 bits, the sigma(W*I) (weight*input) can be computed >>>>>on 32 bits (each W*I will have at most 16 bits). >>>>> >>>>>Actually sigma(W*I) will have no more than 20 bits if each neuron has at most 16 >>>>>inputs. 32 bits allows for 65536 input per neuron. >>>>> >>>>>This -maybe- allows for a fast table lookup of the atan function that I see used >>>>>often in ANN. I think it can be a little faster than x/(1+|x|) computed using >>>>>floating point arithmetic. Also, and this is even more important, the sigma(W*I) >>>>>would use integer arithmetic instead of floating point. >>>>> >>>>>Maybe I should just do a Google search for this, I'm sure I'm not the first one >>>>>to think about this optimzation. >>>> >>>>I'm actually sure you are the first to find this optimization! >>>> >>>>The reason is that the average AI scientist never is doing many practical >>>>experiments with ANNs. Basically practical researchers outside ANNs are doing >>>>somteimes a few experiments like you and i. Further from that very tiny >>>>percentile researchers that sometimes do ANN experiments the average solution >>>>they come up with when they need to calculate it faster is either ask some >>>>system time of a supercomputer, or more likely fill a sporthal of their own >>>>university department with PC's, then they lay down a few network cables under >>>>the floor and they run a few carefully selected benchmarks showing their beowulf >>>>cluster really is a great thing to have. >>>> >>>>When a year or 10 ago some dude was looking around for funding for either a >>>>supercomputer or to buy hardware to speedup his neural networking software which >>>>was using quite some neurons, then i have translated his quickbasic program into >>>>C and optimized its speed by writing out stuff and finding clever loops within >>>>it that lossless speeded it up. >>>> >>>>In total i managed to speedup his software around a factor 1000 after 7 days of >>>>hard work (remember i started with 100KB quickbasic code and ended up with about >>>>20KB C code. note that the quickbasic used was the compilerversion not an >>>>interpreter). >>>> >>>>I was very amazed then that he didn't like me doing that, because i had thought >>>>he just wanted his software to run faster. When you grow up you slowly learn how >>>>people work and in the AI world this is pretty easy to predict. >>>> >>>>So having that in mind i am sure that you are one of the first to publicly speak >>>>out and say that you can speedup things a lot! >>>> >>>>Last tuesday i was at a supercomputing conference and of course for hours i have >>>>talked with many researchers and professors. I am still very proud that against >>>>no one after talking what they did on the computer i told to that i would love >>>>to take a look at their code in order to give them a few free tips to speedup >>>>their software quite some times. With some of them i sure knew i could. >>>> >>>>Some still haven't found out the difference between latency and bandwidth and >>>>what is out of order (the R14k processors) and that the new itanium2 processors >>>>here (416 of them clocked 1.3Ghz and 832GB ram) which are way faster for >>>>floating point and way slower for latency than the old R14ks. >>>> >>>>Possible the slow latency is partly because of the interesting idea to run >>>>redhat 7.2 with the unmodified linux kernel 2.4.19 at it. Let's blame the >>>>economic times that causes this Dutch habit to save money :) >>>> >>>>A good example of several different research projects there which i can speedup >>>>with just 5 minutes of work is that several projects lose like all of their >>>>system time to a RNG as for each number in the calculation matrix they take a >>>>number from the RNG. >>>> >>>>They compile of course with option -O2. >>>> >>>>Their RNG is some slow thing that runs in 32 bits. >>>> >>>>However for my latency test i did a very small effort to speedup an already fast >>>>RNG a little. Replacing their RNG by this one would speedup their field >>>>calculations quite a lot. >>>> >>>>The matrix calculations they do then are pretty fast by the way as an efficient >>>>library is doing them for them. >>>> >>>>However they could also speed the entire program up incredibly by using 64 bits >>>>integer values instead of floating point. >>>> >>>>Remember both are 64 bit processors. Both the R14k (8MB L2 cache) and the >>>>I2-Madisons which are 3MB L2 cache. >>>> >>>>The research these guys do then still is very good research. >>>> >>>>No i won't mention a single name of the guys. They are cool guys. >>>> >>>>Best regards, >>>>Vincent >>> >>> >>> >>>But I have seen some commercial ANN applications out there. Surely these have >>>optimized integer arithmetic, because there must be an economical incentive to >>>do so. >>> >> >>Of course i didn't try all of them. Just a few. but the few i tried were not >>doing very well and dead slow in fact. >> >>Because just consider who buys such software, then you already know that >>features and logics used are for them more important than speed of the logics. >> >>I have to admit that my experiments i did it with dead slow code too, because >>the networks i used didn't have 10000 neurons like for example Dan Thies used to >>tune a chess evaluation (using a normal chess program and letting the ANN learn >>the evaluation including material values etc). For its time his ANN was very >>expensive and very fast (could do 10000 evaluations a seconds at the 10000 >>neuron network). > > >That's 1 cycle per neuron on a 100MHz computer, or 10 cycles per neuron on a >1GHz computer. > > > >>It still has to beat diep version 1.2 though (diep version 1.2 could at the >>tested hardware not put mate with KRK if i remember well). > > >Maybe using it for the evaluation is not the most efficient use of a neural >network in a chess program. It seems that the way human players manage to search >the tree is vastly underestimated. > > > > Christophe I agree with you that search is underestimated in chess but I also believe that search and evaluation are connected because a lot of search decisions are based on evaluation of positions that are not leaf positions so you cannot seperate them and say search improvement gives x elo and evaluation improvement gives y elo. Uri
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.