神经网络(感知器)-在执行二进制分类时可视化决策边界(作为超平面) [英] Neural network (perceptron) - visualizing decision boundary (as a hyperplane) when performing binary classification

查看:77
本文介绍了神经网络(感知器)-在执行二进制分类时可视化决策边界(作为超平面)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想可视化一个只有一个神经元(3个输入,二进制输出)的简单神经网络的决策边界.我正在从Keras NN模型中提取权重,然后尝试使用matplotlib绘制表面平面.不幸的是,超平面没有出现在散点图上的点之间,而是显示在所有数据点的下方(请参见输出图像).

I would like to visualize the decision boundary for a simple neural network with only one neuron (3 inputs, binary output). I'm extracting the weights from a Keras NN model and then attempting to draw the surface plane using matplotlib. Unfortunately, the hyperplane is not appearing between the points on the scatter plot, but instead is displaying underneath all the data points (see output image).

我正在使用等式计算超平面的z轴 z = (d - ax - by) / c用于定义为ax + by + cz = d

I am calculating the z-axis of the hyperplane using the equation z = (d - ax - by) / c for a hyperplane defined as ax + by + cz = d

有人可以帮助我基于NN权重正确构建和显示超平面吗?

Could somebody assist me with correctly constructing and displaying a hyperplane based on the NN weights?

此处的目标是使用公开数据集(

The goal here is to classify individuals into two groups (diabetes or no diabetes), based on 3 predictor variables using a public dataset (https://www.kaggle.com/uciml/pima-indians-diabetes-database).

%matplotlib notebook

import pandas as pd
import numpy as np
from keras import models
from keras import layers
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

EPOCHS = 2

#Data source: https://www.kaggle.com/uciml/pima-indians-diabetes-database
ds = pd.read_csv('diabetes.csv', sep=',', header=0)

#subset and split
X = ds[['BMI', 'DiabetesPedigreeFunction', 'Glucose']]
Y = ds[['Outcome']]

#construct perceptron with 3 inputs and a single output
model = models.Sequential()
layer1 = layers.Dense(1, activation='sigmoid', input_shape=(3,))
model.add(layer1)

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

#train perceptron
history = model.fit(x=X, y=Y, epochs=EPOCHS)

#display accuracy and loss
epochs = range(len(history.epoch))

plt.figure()
plt.plot(epochs, history.history['accuracy'])
plt.xlabel('Epochs')
plt.ylabel('Accuracy')

plt.figure()
plt.plot(epochs, history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('Loss')

plt.show()

#extract weights and bias from model
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]

w1 = weights[0][0] #a
w2 = weights[1][0] #b
w3 = weights[2][0] #c
b = biases[0]      #d

#construct hyperplane: ax + by + cz = d
a,b,c,d = w1,w2,w3,b

x_min = ds.BMI.min()
x_max = ds.BMI.max()

x = np.linspace(x_min, x_max, 100)

y_min = ds.DiabetesPedigreeFunction.min()
y_max = ds.DiabetesPedigreeFunction.max()

y = np.linspace(y_min, y_max, 100)

Xs,Ys = np.meshgrid(x,y)
Zs = (d - a*Xs - b*Ys) / c

#visualize 3d scatterplot with hyperplane
fig = plt.figure(num=None, figsize=(9, 9), dpi=100, facecolor='w', edgecolor='k')
ax = fig.gca(projection='3d')

ax.plot_surface(Xs, Ys, Zs, alpha=0.45)

ax.scatter(ds.BMI, ds.DiabetesPedigreeFunction, ds.Glucose, c=ds.Outcome)

ax.set_xlabel('BMI')
ax.set_ylabel('DiabetesPedigreeFunction')
ax.set_zlabel('Glucose')

推荐答案

最佳答案,而无需详细阅读所有代码.似乎您已应用了S型激活.如果您在没有激活的情况下进行训练(激活=线性"),则应该获得所需的可视化效果.您可能需要训练更长的时间才能收敛(假设它可以在不激活的情况下收敛).如果要保留乙状结肠,则需要通过此激活来映射线性神经元(因此,它不再像飞机一样了).

Best guess without reading all the code in detail. It looks like you applied a sigmoid activation. If you train with no activation (activation='linear'), you should get the visualization you are looking for. You may have to train longer to get convergence (assuming it can converge without an activation). If you want to keep the sigmoid, then you need to map your linear neuron through this activation (hence it won't look like a plane anymore).

我对NN的理解.尝试从3到1的密集层和S型激活来优化方程式中的变量a,b,c,d:

My understanding of NNs. A dense layer from 3 to 1 and a sigmoid activation is the attempt to optimize the variables a,b,c,d in the equation:

f(x,y,z)= 1/(1 + e ^(-D(x,y,z)); D(x,y,z)= ax + by + cz + d

f(x,y,z) = 1/(1+e^(-D(x,y,z)); D(x,y,z) = ax+by+cz+d

为了使binary_crossentropy(选择的内容)最小化,我将使用B作为日志的总和.我们的损失方程看起来像:

so that the binary_crossentropy (what you picked) is minimized, I will use B for the sum of the logs. Our loss equation would look something like:

L = ∑ B(y,Y)

L = ∑ B(y,Y)

其中y是我们要预测的值,在这种情况下为0或1,Y是通过上述公式输出的值,总和加总了所有数据(或NN中的批次).因此,可以这样写

where y is the value we want to predict, a 0 or 1 in this case, and Y is the value output by the equation above, the sum adds over all the data (or batches in a NN). Hence, this can be written like

L = ∑ B(y,f(x,y,z))

L = ∑ B(y,f(x,y,z))

找出L个给定变量a,b,c,d的最小值可以直接通过取偏导数并求解给定的方程组来计算(这就是为什么NN永远不要与一小组变量一起使用的原因(例如4),因为可以明确解决它们,所以培训没有意义.不论是直接求解还是使用体面的梯度逐步将a,b,c,d逐渐移至最小值;无论如何,我们最终得到优化的a,b,c,d.

Finding the minimum of L given variables a,b,c,d can probably be calculated directly by taking partial derivatives and solving the given system of equations (This is why NN should never be used with a small set of variables (like 4), because they can be explicitly solved, so there is no point in training). Regardless of direct solving or using stocastic gradient decent to slowly move the a,b,c,d towards a minimum; in any case we end up with the optimized a,b,c,d.

a,b,c,d经过专门调整,可以产生一些值,将其插入到S型方程后,生成预测的类别,这些类别在Loss方程中进行测试将为我们提供最小的损失.

a,b,c,d have be tuned to specifically produce values that when plugged into the sigmoid equation produce predicted categories that when tested in the Loss equation would give us a minimum loss.

虽然我纠正了.在这种情况下,由于我们专门有一个S型曲面,因此建立并求解边界方程似乎确实会产生一个平面(不知道该平面).我认为这不适用于任何其他激活或具有不止一层的任何NN.

I stand corrected though. In this case, because we have specifically a sigmoid, then setting up and solving the boundary equation, does appear to always produce a plane (did not know that). I don't think this would work with any other activation or with any NN that has more than one layer.

1/2 = 1/(1 + e ^(-D(x,y,z))) ... D(x,y,z)= 0 ax + by + cz + d = 0

1/2 = 1/(1 + e^(-D(x,y,z))) ... D(x,y,z) = 0 ax+by+cz+d = 0

因此,我下载了您的数据并运行了您的代码.我一点都没有收敛.我尝试了各种batch_size,损失函数和激活函数.没有.根据图片,似乎几乎所有随机权重都倾向于远离群集,而不是尝试寻找群集的中心.

So, I downloaded your data and ran your code. I don't get convergence at all; I tried various batch_sizes, loss functions, and activation functions. Nothing. Based on the picture, it seems plausible that nearly every randomized weights are going to favor moving away from the cluster than trying to find the center of it.

您可能需要先转换数据(在所有轴上进行归一化可能会成功),或手动将权重设置为中心的某个值,以使训练收敛.长话短说,您的a,b,c,d并不是最佳选择.您也可以显式求解上面的偏导数,并找到最优的a,b,c,d,而不是尝试使单个神经元收敛.还有一些明确的方程式可用于计算将二进制数据分开的最佳平面(线性回归的扩展).

You probably need to transform your data first (normalize on all the axes might do the trick), or manually set your weights to something in the center, so that the training converges. Long story short, your a,b,c,d are not optimal. You could also explicitly solve the partial derivatives above and find the optimal a,b,c,d instead of trying to get a single neuron to converge. There are also explicit equations for calculating the optimal plane that separates binary data (an extension of linear regression).

这篇关于神经网络(感知器)-在执行二进制分类时可视化决策边界(作为超平面)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆