无法在 Python 中仅用两个隐藏神经元解决 XOR 问题 [英] Unable to solve the XOR problem with just two hidden neurons in Python

查看:38
本文介绍了无法在 Python 中仅用两个隐藏神经元解决 XOR 问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小型的 3 层神经网络,其中包含两个输入神经元、两个隐藏神经元和一个输出神经元.我试图坚持只使用 2 个隐藏神经元的以下格式.

我试图展示如何将其用作 XOR 逻辑门,但是只有两个隐藏的神经元在 1,000,000 次迭代后我得到以下糟糕的输出!

输入:0 0 输出:[0.01039096]输入:1 0 输出:[0.93708829]输入:0 1 输出:[0.93599738]输入:1 1 输出:[0.51917667]

如果我使用三个隐藏神经元,我会在 100,000 次迭代后获得更好的输出:

输入:0 0 输出:[0.01831612]输入:1 0 输出:[0.98558057]输入:0 1 输出:[0.98567602]输入:1 1 输出:[0.02007876]

我得到了一个不错的输出,隐藏层中有 3 个神经元,但隐藏层中有两个神经元.为什么?

根据下面的评论,这个

其他资源

I have a small, 3 layer, neural network with two input neurons, two hidden neurons and one output neuron. I am trying to stick to the below format of using only 2 hidden neurons.

I am trying to show how this can be used to behave as the XOR logic gate, however with just two hidden neurons I get the following poor output after 1,000,000 iterations!

Input: 0 0   Output:  [0.01039096]
Input: 1 0   Output:  [0.93708829]
Input: 0 1   Output:  [0.93599738]
Input: 1 1   Output:  [0.51917667]

If I use three hidden neurons I get a much better output with 100,000 iterations:

Input: 0 0   Output:  [0.01831612]
Input: 1 0   Output:  [0.98558057]
Input: 0 1   Output:  [0.98567602]
Input: 1 1   Output:  [0.02007876]

I am getting a decent output with 3 neurons in the hidden layer but not with two neurons in the hidden layer. Why?

As per a comment below, this repo contains code of high to solve the XOR problem using two hidden neurons.

I can't figure out what I am doing wrong. Any suggestions are appreciated! Attached is my code:

import numpy as np
import matplotlib
from matplotlib import pyplot as plt


# Sigmoid function
def sigmoid(x, deriv=False):
    if deriv:
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))


alpha = [0.7]

# Input dataset
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

# Output dataset
y = np.array([[0, 1, 1, 0]]).T

# seed random numbers to make calculation deterministic
np.random.seed(1)

# initialise weights randomly with mean 0
syn0 = 2 * np.random.random((2, 3)) - 1  # 1st layer of weights synapse 0 connecting L0 to L1
syn1 = 2 * np.random.random((3, 1)) - 1  # 2nd layer of weights synapse 0 connecting L1 to L2

# Randomize inputs for stochastic gradient descent
data = np.hstack((X, y))    # append Input and output dataset
np.random.shuffle(data)     # shuffle
x, y = np.array_split(data, 2, 1)    # Split along vertical(1) axis

for iter in range(100000):
    for i in range(4):
        # forward prop
        layer0 = x[i]  # Input layer
        layer1 = sigmoid(np.dot(layer0, syn0))  # Prediction step for layer 1
        layer2 = sigmoid(np.dot(layer1, syn1))  # Prediction step for layer 2

        layer2_error = y[i] - layer2  # Compare how well layer2's guess was with input

        layer2_delta = layer2_error * sigmoid(layer2, deriv=True)  # Error weighted derivative step

        if iter % 10000 == 0:
            print("Error: ", str(np.mean(np.abs(layer2_error))))
            plt.plot(iter, layer2_error, 'ro')


        # Uses "confidence weighted error" from l2 to establish an error for l1
        layer1_error = layer2_delta.dot(syn1.T)

        layer1_delta = layer1_error * sigmoid(layer1, deriv=True)  # Error weighted derivative step

        # Since SGD we need to dot product two 1D arrays. This is how.
        syn1 += (alpha * np.dot(layer1[:, None], layer2_delta[None, :]))  # Update weights
        syn0 += (alpha * np.dot(layer0[:, None], layer1_delta[None, :]))

    # Training was done above, below we re run to test algorithm

    layer0 = X  # Input layer
    layer1 = sigmoid(np.dot(layer0, syn0))  # Prediction step for layer 1
    layer2 = sigmoid(np.dot(layer1, syn1))  # Prediction step for layer 2


plt.show()
print("output after training: \n")
print("Input: 0 0 \t Output: ", layer2[0])
print("Input: 1 0 \t Output: ", layer2[1])
print("Input: 0 1 \t Output: ", layer2[2])
print("Input: 1 1 \t Output: ", layer2[3])

解决方案

This is due to the fact that you have not considered any bias for the neurons. You have only used weights to try and fit the XOR model.

Incase of 2 neurons in the hidden layer, the network under-fits as it can't compensate for the bias.

When you use 3 neurons in the hidden layer, the extra neuron counters the effect caused due to the lack of bias.

This is an example of a network for XOR gate. You'll notice theta (bias) added to the hidden layers. This gives the network an additional parameter to tweak.

Additional resources

这篇关于无法在 Python 中仅用两个隐藏神经元解决 XOR 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆