如何在scikit学习中绘制逻辑回归的决策边界 [英] How to plot the decision boundary of logistic regression in scikit learn

查看:73
本文介绍了如何在scikit学习中绘制逻辑回归的决策边界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在scikit learning中绘制逻辑回归的决策边界

I am trying to plot the decision boundary of logistic regression in scikit learn

features_train_df :  650 columns, 5250 rows
features_test_df : 650 columns, 1750 rows
class_train_df = 1 column (class to be predicted), 5250 rows
class_test_df = 1 column (class to be predicted), 1750 rows

分类器代码;

tuned_logreg = LogisticRegression(penalty =  'l2', tol =  0.0001,C =  0.1,max_iter =  100,class_weight = "balanced")
tuned_logreg.fit(x_train[sorted_important_features_list[0:650]].values, y_train['loss'].values)
y_pred_3 = tuned_logreg.predict(x_test[sorted_important_features_list[0:650]].values)

我得到了分类器代码的正确输出.

I am getting the correct output for the classifier code.

在线获取此代码:

code:

X = features_train_df.values
# evenly sampled points
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 50),
                 np.linspace(y_min, y_max, 50))
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

#plot background colors
ax = plt.gca()
Z = tuned_logreg.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)
cs = ax.contourf(xx, yy, Z, cmap='RdBu', alpha=.5)
cs2 = ax.contour(xx, yy, Z, cmap='RdBu', alpha=.5)
plt.clabel(cs2, fmt = '%2.1f', colors = 'k', fontsize=14)

# Plot the points
ax.plot(Xtrain[ytrain == 0, 0], Xtrain[ytrain == 0, 1], 'ro', label='Class 1')
ax.plot(Xtrain[ytrain == 1, 0], Xtrain[ytrain == 1, 1], 'bo', label='Class 2')

# make legend
plt.legend(loc='upper left', scatterpoints=1, numpoints=1)

错误:

 ValueError: X has 2 features per sample; expecting 650

请告诉我我哪里出错了

推荐答案

我发现你的代码有问题.请仔细看下面的讨论.

I got the problem in your code. Please take a careful look at the following discussion.

xx, yy = np.meshgrid(np.linspace(x_min, x_max, 50), np.linspace(y_min, y_max, 50))
grid = np.c_[xx.ravel(), yy.ravel()]
Z = tuned_logreg.predict_proba(grid)[:, 1]

在这里考虑变量的形状:

Think about the shapes of variables here:

np.linspace(x_min,x_max,50)返回具有50个值的列表.然后应用np.meshgrid 得到xxyy (50, 50) 的形状.最后应用np.c_[xx.ravel(), yy.ravel()]得到可变网格 (2500, 2).您为 predict_proba 函数提供了 2500 个具有 2 个特征值的实例.

np.linspace(x_min, x_max, 50) returns a list with 50 values. Then applying np.meshgrid makes the shape of xx and yy (50, 50). Finally applying np.c_[xx.ravel(), yy.ravel()] makes the shape of variable grid (2500, 2). You are giving 2500 instances with 2 feature values to predict_proba function.

这就是您收到错误的原因:ValueError: X has 2 features per sample;预期为650 .您必须传递包含650个列(功能)值的结构.

Thats why you are getting the error: ValueError: X has 2 features per sample; expecting 650. You must pass a structure which contains 650 column (features) values.

predict 期间,你做对了.

During predict you did it correctly.

y_pred_3 = tuned_logreg.predict(x_test[sorted_important_features_list[0:650]].values)

因此,请确保传递给 fit() predict() predict_proba()方法的实例中的功能数量是一样.

So, make sure the number of features in the instances passed to fit(), predict() and predict_proba() methods are same.

对您提供的SO 帖子中的示例的解释:

Explanation of the example from your provided SO post:

X, y = make_classification(200, 2, 2, 0, weights=[.5, .5], random_state=15)
clf = LogisticRegression().fit(X[:100], y[:100])

这里 X 的形状是 (200, 2) 但是当分类器被训练时,他们使用 X[:100] 这意味着只有 100 个特征有 2 个类.为了进行预测,他们使用:

Here the shape of X is (200, 2) but when classifier is trained, they are using X[:100] that means only 100 features with 2 classes. For prediction, they are using:

xx, yy = np.mgrid[-5:5:.01, -5:5:.01]
grid = np.c_[xx.ravel(), yy.ravel()]

此处, xx 的形状为(1000,1000),网格为(1000000,2).因此,用于训练和测试的功能数量为2.

Here, shape of xx is (1000, 1000) and grid is (1000000, 2). So, the number of features used both for training and testing is 2.

这篇关于如何在scikit学习中绘制逻辑回归的决策边界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆