ANN BackProp/梯度检查的问题. [英] Problems With ANN BackProp/Gradient Checking.

查看:64
本文介绍了ANN BackProp/梯度检查的问题.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只用python编写了我的第一个神经网络类.据我所知,所有的东西都可以正常工作,但是其中似乎有一些我似乎找不到的错误(可能是盯着我看). 我首先在10,000个MNIST数据示例上进行了尝试,然后在尝试复制符号函数时再次尝试,并在尝试复制XOR门时再次尝试.每次,无论历元数如何,它总是从所有输出神经元(不管可能有多少个)产生输出,这些输出几乎都具有相同的值,但是成本函数似乎在下降. 我正在使用批量梯度下降,所有工作都使用矢量完成(每个训练示例没有循环).

Just wrote up my first Neural Network Class in python. Everything as far as I can tell should work, but there is some bug in it that I can't seem to find(Probably staring me right in the face). I first tried it on 10,000 examples of the MNIST data, then again when trying to replicate the sign function, and again when trying to replicate a XOR Gate. Every time, regardless of the # of epochs, it always produces output from all the output neurons(regardless of how many there may be) that are all roughly the same value, but the cost function seems to be going down. I am using batch gradient descent, all done using vectors(no loop for each training example).

#Neural Network Class

import numpy as np



class NeuralNetwork:

#methods
def __init__(self,layer_shape):
    #Useful Network Info
    self.__layer_shape = layer_shape
    self.__layers = len(layer_shape)

    #Initialize Random Weights
    self.__weights = [] 
    self.__weight_sizes = []
    for i in range(len(layer_shape)-1):
        current_weight_size = (layer_shape[i+1],layer_shape[i]+1)
        self.__weight_sizes.append(current_weight_size)
        self.__weights.append(np.random.normal(loc=0.1,scale=0.1,size=current_weight_size))

def sigmoid(self,z):
    return (1/(1+np.exp(-z)))

def sig_prime(self,z):
    return np.multiply(self.sigmoid(z),(1-self.sigmoid(z)))


def Feedforward(self,input,Train=False):
    self.__input_cases = np.shape(input)[0]

    #Empty list to hold the output of every layer.
    output_list = []
    #Appends the output of the the 1st input layer.
    output_list.append(input)

    for i in range(self.__layers-1):
        if i == 0:
            output = self.sigmoid(np.dot(np.concatenate((np.ones((self.__input_cases,1)),input),1),self.__weights[0].T))
            output_list.append(output)
        else:
            output = self.sigmoid(np.dot(np.concatenate((np.ones((self.__input_cases,1)),output),1),self.__weights[i].T))                 
            output_list.append(output)

    #Returns the final output if not training.         
    if Train == False:
        return output_list[-1]
    #Returns the entire output_list if need for training
    else:
        return output_list

def CostFunction(self,input,target,error_func=1):
    """Gives the cost of using a particular weight matrix 
    based off of the input and targeted output"""

    #Run the network to get output using current theta matrices.
    output = self.Feedforward(input)


    #####Allows user to choose Cost Functions.##### 

    #
    #Log Based Error Function
    #
    if error_func == 0:
        error = np.multiply(-target,np.log(output))-np.multiply((1-target),np.log(1-output))
        total_error = np.sum(np.sum(error))
    #    
    #Squared Error Cost Function
    #
    elif error_func == 1:
        error = (target - output)**2
        total_error = 0.5 * np.sum(np.sum(error))

    return total_error

def Weight_Grad(self,input,target,output_list):

            #Finds the Error Deltas for Each Layer
            # 
            deltas = []
            for i in range(self.__layers - 1):
                #Finds Error Delta for the last layer
                if i == 0:

                    error = (target-output_list[-1])

                    error_delta = -1*np.multiply(error,np.multiply(output_list[-1],(1-output_list[-1])))
                    deltas.append(error_delta)
                #Finds Error Delta for the hidden layers   
                else:
                    #Weight matrices have bias values removed
                    error_delta = np.multiply(np.dot(deltas[-1],self.__weights[-i][:,1:]),output_list[-i-1]*(1-output_list[-i-1]))
                    deltas.append(error_delta)

            #
            #Finds the Deltas for each Weight Matrix
            #
            Weight_Delta_List = []
            deltas.reverse()
            for i in range(len(self.__weights)):

                current_weight_delta = (1/self.__input_cases) * np.dot(deltas[i].T,np.concatenate((np.ones((self.__input_cases,1)),output_list[i]),1))
                Weight_Delta_List.append(current_weight_delta)
                #print("Weight",i,"Delta:","\n",current_weight_delta)
                #print()

            #
            #Combines all Weight Deltas into a single row vector
            #
            Weight_Delta_Vector = np.array([[]])
            for i in Weight_Delta_List:

                Weight_Delta_Vector = np.concatenate((Weight_Delta_Vector,np.reshape(i,(1,-1))),1)
            return Weight_Delta_List        

def Train(self,input_data,target):
    #
    #Gradient Checking:
    #

    #First Get Gradients from first iteration of Back Propagation 
    output_list = self.Feedforward(input_data,Train=True)
    self.__input_cases = np.shape(input_data)[0]

    Weight_Delta_List = self.Weight_Grad(input_data,target,output_list)  

    #Creates List of Gradient Approx arrays set to zero.
    grad_approx_list = []
    for i in self.__weight_sizes:
        current_grad_approx = np.zeros(i)
        grad_approx_list.append(current_grad_approx)


    #Compute Approx. Gradient for every Weight Change
    for W in range(len(self.__weights)):
        for index,value in np.ndenumerate(self.__weights[W]):
            orig_value = self.__weights[W][index]      #Saves the Original Value
            print("Orig Value:", orig_value)

            #Sets weight to  weight +/- epsilon
            self.__weights[W][index] = orig_value+.00001
            cost_plusE = self.CostFunction(input_data, target)

            self.__weights[W][index] = orig_value-.00001
            cost_minusE = self.CostFunction(input_data, target)

            #Solves for grad approx:
            grad_approx = (cost_plusE-cost_minusE)/(2*.00001)
            grad_approx_list[W][index] = grad_approx

            #Sets Weight Value back to its original value
            self.__weights[W][index] = orig_value


    #
    #Print Gradients from Back Prop. and Grad Approx. side-by-side:
    #

    print("Back Prop. Grad","\t","Grad. Approx")
    print("-"*15,"\t","-"*15)
    for W in range(len(self.__weights)):
        for index, value in np.ndenumerate(self.__weights[W]):
            print(self.__weights[W][index],"\t"*3,grad_approx_list[W][index])

    print("\n"*3)
    input_ = input("Press Enter to continue:")


    #
    #Perform Weight Updates for X number of Iterations
    #
    for i in range(10000):
    #Run the network
        output_list = self.Feedforward(input_data,Train=True)
        self.__input_cases = np.shape(input_data)[0]

        Weight_Delta_List = self.Weight_Grad(input_data,target,output_list)


        for w in range(len(self.__weights)):
            #print(self.__weights[w])
            #print(Weight_Delta_List[w])
            self.__weights[w] = self.__weights[w] - (.01*Weight_Delta_List[w]) 


    print("Done")`

我什至实现了渐变检查",并且值不同,我想我会尝试用大约"替换反向传播"更新.渐变检查值,但给出的结果相同,甚至连我的渐变检查"代码也令人怀疑.

I even implememented Gradient Checking and the values are different, and I thought I would try replacing the Back Propagation updates with the Approx. Gradient Checking values, but that gave the same results, causing me to doubt even my Gradient Checking code.

以下是在训练XOR Gate时产生的一些值:

Here are some of the values being produced when training for the XOR Gate:

后置道具等级:0.0756102610697 0.261814503398 0.0292734023876 学位大约:0.05302210631166 0.0416095559674 0.0246847342122 费用:训练前:0.508019225507训练后0.50007095103(10000个历时之后) 4个不同示例的输出(经过培训): [0.49317733] [0.49294556] [0.50489004] [0.50465824]

Back Prop. Grad: 0.0756102610697 0.261814503398 0.0292734023876 Grad Approx: 0.05302210631166 0.0416095559674 0.0246847342122 Cost: Before Training: 0.508019225507 After Training 0.50007095103 (After 10000 Epochs) Output for 4 different examples(after training): [ 0.49317733] [ 0.49294556] [ 0.50489004] [ 0.50465824]

所以我的问题是,反向传播或梯度检查是否存在任何明显的问题?当人工神经网络显示这些症状(输出都大致相同/成本下降)时,是否存在任何常见的问题?

So my question is, is there any obvious problem with my Back Propagation, or my gradient checking? Are there any usual problems when a ANN shows these symptoms(Outputs are all roughly the same/Cost is going down)?

推荐答案

我不太擅长阅读python代码,但是您的XOR渐变列表包含3个元素,对应3个权重.我假设,这是两个输入和一个神经元的偏倚.如果为真,则此类网络将无法学习XOR(可以学习XOR的微型NN需要两个隐藏的神经元和一个输出单元).现在,查看前馈函数,如果np.dot计算出其名称所表示的含义(即两个向量的点积),并且sigmoid是标量,那么它将始终与一个神经元的输出相对应,我不知道您的方式可以使用此代码将更多的神经元添加到层中.

I'm not very proficient at reading python code, but your gradient list for XOR contains 3 elements, corresponding for 3 weights. I assume, that these are two inputs and one bias for a single neuron. If true, such network can not learn XOR (minimun NN that can learn XOR need two hidden neurons and one output unit). Now, looking at Feedforward function, if np.dot computes what it name says (i.e dot product of two vectors), and sigmoid is scalar, then this will always correspond to output of one neuron and I don't see the way how you can add more neurons to the layers with this code.

以下建议可能对调试任何新实施的NN有用:

Following advice could be useful to debug any newly implemented NN:

1)不要以MNIST甚至XOR开头.完美的实现可能无法学习XOR,因为它很容易陷入局部最小值,您可能会花费大量时间寻找不存在的错误.一个良好的起点将是AND功能,可以通过单个神经元来学习

1) Don't start with MNIST or even XOR. Perfectly good implementation may fail to learn XOR because it can easily fell into local minima and you could spent a lot of time hunting for non-existent error. A good starting point will be AND function, that can be learned with single neuron

2)通过在少数示例上手动计算结果来检查正向计算过程.多数民众赞成在少重量的情况下很容易做到.然后尝试使用数值梯度对其进行训练.如果失败,则说明您的数值梯度是错误的(手动检查)或训练程序是错误的. (如果您将学习率设置得太大,它可能无法工作,但是由于错误表面是凸的,因此培训必须收敛).

2) Check forward computation pass by manually computing results on few examples. thats easy to do with small number of weights. Then try to train it with numerical gradient. If it fails, then either your numerical gradient is wrong (check that by hand) or training procedure is wrong. (it can fail to work if you set too large learning rate, but otherwise training must converge since error surface is convex).

3)一旦可以用数字grad训练它,调试分析梯度(检查每个神经元的梯度,然后检查各个权重的梯度).再次可以手动计算并与您所看到的进行比较.

3) once you can train it with numerical grad, debug your analytical gradients (check gradient per neuron, and then gradient for individual weights). That again can be computed manually and compared to what you see.

4)完成第3步后,如果一切正常,请添加一个隐藏层,然后使用AND功能重复第2步和第3步.

4) Upon completion of step 3, if everything works OK, add one hidden layer and repeat steps 2 and 3 with AND function.

5)一切都与AND一起使用后,您可以移至XOR函数和其他更复杂的任务.

5) after everything works with AND, you can move to XOR function and other more complicated tasks.

此过程似乎很耗时,但最终却使NN正常工作

This procedure may seems time consuming, but it almost aways results in working NN in the end

这篇关于ANN BackProp/梯度检查的问题.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆