重量初始化 [英] Weight Initialisation

查看:82
本文介绍了重量初始化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我计划将Nguyen-Widrow算法用于具有多个隐藏层的NN.在研究过程中,我发现了很多歧义,我想澄清一下.

以下是Nguyen-Widrow算法的伪代码

      Initialize all weight of hidden layers with random values
      For each hidden layer{
          beta = 0.7 * Math.pow(hiddenNeurons, 1.0 / number of inputs);
          For each synapse{
             For each weight{
              Adjust weight by dividing by norm of weight for neuron and * multiplying by beta value
            }
          } 
      }

只想弄清楚hiddenNeurons的值是特定隐藏层的大小,还是网络中所有隐藏层的大小.通过查看各种来源,我感到很困惑. /p>

换句话说,如果我有一个网络(3-2-2-2-3)(索引0是输入层,索引4是输出层),则hiddenNeurons的值将是:

NumberOfNeuronsInLayer(1) + NumberOfNeuronsInLayer(2) + NumberOfNeuronsInLaer(3)

或者只是

NumberOfNeuronsInLayer(i),其中i是我当前所在的图层

那么,hiddenNeurons值将是当前隐藏层的大小,而输入值将是前一个隐藏层的大小?

解决方案

听起来像我想要更精确的代码.这是我参与的项目中的一些实际代码行.希望您读C.它有点抽象和简化.有一个struct nn,用于保存神经网络数据.您可能有自己的抽象数据类型.

项目中的代码行(有所简化):

float *w = nn->the_weight_array;
float factor = 0.7f * powf( (float) nn->n_hidden, 1.0f / nn->n_input);

for( w in all weight )
    *w++ = random_range( -factor, factor );

/* Nguyen/Widrow */
w = nn->the_weight_array;
for( i = nn->n_input; i; i-- ){
    _scale_nguyen_widrow( factor, w, nn->n_hidden );
    w += nn->n_hidden;
}

函数调用:

static void _scale_nguyen_widrow( float factor, float *vec, unsigned int size )
{
    unsigned int i;
    float magnitude = 0.0f;
    for ( i = 0; i < size; i++ )
        magnitude += vec[i] * vec[i];

    magnitude = sqrtf( magnitude );

    for ( i = 0; i < size; i++ )
         vec[i] *= factor / magnitude;
}

static inline float random_range( float min, float max)
{
    float range = fabs(max - min);
    return ((float)rand()/(float)RAND_MAX) * range + min;
}

提示:
在实现Nguyen/Widrow权重初始化之后,您实际上可以在正向计算中添加一点代码行,以将每次激活都转储到文件中.然后,您可以检查神经元集合对激活功能的影响程度.找到平均值和标准偏差.您甚至可以使用绘图工具对其进行绘图. gnuplot. (无论如何,您都需要像gnuplot这样的绘图工具来绘制错误率等.)我在实现过程中做到了这一点.地块很好,并且使用Nguyen/Widrow进行我的项目时,最初的学习变得更快.

PS:根据Nguyen和Widrows的意图,我不确定我的实现是否正确.我什至不认为我在乎,只要它确实可以改善初始学习.

祝你好运,
-Øystein

I plan to use the Nguyen-Widrow Algorithm for an NN with multiple hidden layers. While researching, I found a lot of ambiguities and I wish to clarify them.

The following is pseudo code for the Nguyen-Widrow Algorithm

      Initialize all weight of hidden layers with random values
      For each hidden layer{
          beta = 0.7 * Math.pow(hiddenNeurons, 1.0 / number of inputs);
          For each synapse{
             For each weight{
              Adjust weight by dividing by norm of weight for neuron and * multiplying by beta value
            }
          } 
      }

Just wanted to clarify whether the value of hiddenNeurons is the size of the particular hidden layer, or the size of all the hidden layers within the network. I got mixed up by viewing various sources.

In other words, if I have a network (3-2-2-2-3) (index 0 is input layer, index 4 is output layer), would the value hiddenNeurons be:

NumberOfNeuronsInLayer(1) + NumberOfNeuronsInLayer(2) + NumberOfNeuronsInLaer(3)

Or just

NumberOfNeuronsInLayer(i) , where i is the current Layer I am at

EDIT:

So, the hiddenNeurons value would be the size of the current hidden layer, and the input value would be the size of the previous hidden layer?

解决方案

Sounds to me like you want more precise code. Here are some actual code lines from a project I'm participating to. Hope you read C. It's a bit abstracted and simplified. There is a struct nn, that holds the neural net data. You probably have your own abstract data type.

Code lines from my project (somewhat simplified):

float *w = nn->the_weight_array;
float factor = 0.7f * powf( (float) nn->n_hidden, 1.0f / nn->n_input);

for( w in all weight )
    *w++ = random_range( -factor, factor );

/* Nguyen/Widrow */
w = nn->the_weight_array;
for( i = nn->n_input; i; i-- ){
    _scale_nguyen_widrow( factor, w, nn->n_hidden );
    w += nn->n_hidden;
}

Functions called:

static void _scale_nguyen_widrow( float factor, float *vec, unsigned int size )
{
    unsigned int i;
    float magnitude = 0.0f;
    for ( i = 0; i < size; i++ )
        magnitude += vec[i] * vec[i];

    magnitude = sqrtf( magnitude );

    for ( i = 0; i < size; i++ )
         vec[i] *= factor / magnitude;
}

static inline float random_range( float min, float max)
{
    float range = fabs(max - min);
    return ((float)rand()/(float)RAND_MAX) * range + min;
}

Tip:
After you've implemented the Nguyen/Widrow weight initialization, you can actually add a little code line in the forward calculation that dumps each activation to a file. Then you can check how good the set of neurons hits the activation function. Find the mean and standard deviation. You can even plot it with a plotting tool, ie. gnuplot. (You need a plotting tool like gnuplot anyway for plotting error rates etc.) I did that for my implementation. The plots came out nice, and the initial learning became much faster using Nguyen/Widrow for my project.

PS: I'm not sure my implementation is correct according to Nguyen and Widrows intentions. I don't even think I care, as long as it does improve the initial learning.

Good luck,
-Øystein

这篇关于重量初始化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆