由多个激活函数组成的神经网络 [英] Neural Network composed of multiple activation functions

查看:92
本文介绍了由多个激活函数组成的神经网络的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 sknn 包来构建神经网络.为了优化我正在使用的数据集的神经网络参数,我使用了进化算法.由于该包允许我构建一个神经网络,其中每一层都有不同的激活函数,我想知道这是否是一个实用的选择,或者我是否应该每个网络只使用一个激活函数?在神经网络中具有多个激活函数对神经网络有害、没有损害还是有益?

I am using the sknn package to build a neural network. In order to optimize the parameters of the neural net for the dataset I am using I am using an evolutionary algorithm. Since the package allows me to build a neural net where each layer has a different activation function, I was wondering if that is a practical choice, or whether I should just use one activation function per net? Does having multiple activation functions in a neural net harm, does no damage, or benefit the neural network?

此外,我应该拥有的每层神经元的最大数量是多少,每个网络应该拥有的最大层数是多少?

Also what is the maximum amount of neuron per layer I should have, and the maximum amount of layers per net should I have?

推荐答案

神经网络只是一个(大)数学函数.您甚至可以对同一层中的不同神经元使用不同的激活函数.不同的激活函数允许不同的非线性,这对于求解特定函数可能更有效.使用 sigmoid 而不是 tanh 只会产生微小的差异.更重要的是激活有一个很好的导数.通常使用 tanh 和 sigmoid 的原因是,对于接近 0 的值,它们的作用类似于线性函数,而对于大的绝对值,它们的作用更像是符号函数((-1 或 0)或 1 ),并且它们具有很好的导数.一个相对较新的引入是 ReLU (max(x,0)),它有一个非常简单的导数(除了在 x=0 处),它是非线性的,但重要的是计算速度很快,对于训练量很大的深度网络来说非常好次.

A neural network is just a (big) mathematical function. You could even use different activation functions for different neurons in the same layer. Different activation functions allow for different non-linearities which might work better for solving a specific function. Using a sigmoid as opposed to a tanh will only make a marginal difference. What is more important is that the activation has a nice derivative. The reason tanh and sigmoid are usually used is that for values close to 0 they act like a linear function while for big absolute values they act more like the sign function ((-1 or 0) or 1 ) and they have a nice derivative. A relatively new introduced one is the ReLU (max(x,0)), which has a very easy derivative (except for at x=0), is non-linear but importantly is fast to compute so nice for deep networks with high training times.

归根结底,对于全局性能而言,这方面的选择不是很重要,非线性和上限范围很重要.然而,为了挤出最后一个百分点,这个选择很重要,但主要取决于您的具体数据.这个选择就像隐藏层的数量和这些层内的神经元数量必须通过交叉验证找到,尽管你可以调整你的遗传算子来包含这些.

What it comes down to is that for the global performance the choice in this is not very important, the non-linearity and capped range is important. To squeeze out the last percentage points this choice will matter however but is mostly dependent on your specific data. This choice just like the number of hidden layers and the number of neurons inside these layers will have to be found by crossvalidation, although you could adapt your genetic operators to include these.

这篇关于由多个激活函数组成的神经网络的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆