LinearSVC功能选择在Python中返回不同的coef_ [英] LinearSVC Feature Selection returns different coef_ in Python

查看:142
本文介绍了LinearSVC功能选择在Python中返回不同的coef_的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在训练数据集上使用带有LinearSVC的SelectFromModel.训练和测试集已经拆分,并保存在单独的文件中.当我将LinearSVC安装在训练集上时,我得到了一组coef_ [0],我试图找到这些最重要的特征.当我重新运行脚本时,即使它在相同的训练数据上,我也会得到不同的coef_ [0]值.为什么会这样?

I'm using SelectFromModel with a LinearSVC on a training data set. The training and testing set had been already split and are saved in separate files. When I fit the LinearSVC on the training set I get a set of coef_[0] which I try to find the most important features. When I rerun the script i get different coef_[0] values even though it is on the same training data. Why is this the case?

请参见下面的代码片段(也许有一个我没有看到的错误):

See below for snip of code (maybe there's a bug I don't see):

fig = plt.figure()

#SelectFromModel
lsvc = LinearSVC(C=.01, penalty="l1", dual= False).fit(X_train, Y_train.values.ravel())
X_trainPro = SelectFromModel(lsvc,prefit=True)
sscores = lsvc.coef_[0]
print(sscores)
ax = fig.add_subplot(1, 1, 1)

for i in range(len(sscores)):
    sscores[i] = np.abs(sscores[i])

sscores_sum = 0
for i in range(len(sscores)):
    sscores_sum = sscores_sum + sscores[i]

for i in range(len(sscores)):
    sscores[i] = sscores[i] / sscores_sum

stemp = sscores.copy()
total_weight = 0
feature_numbers = 0
while (total_weight <= .9):
    total_weight = total_weight + stemp.max()
    stemp[np.nonzero(stemp == stemp.max())[0][0]] = 0
    feature_numbers += 1

print(total_weight, feature_numbers)

stemp = sscores.copy()
sfeaturenames = np.array([])
orderScore = np.array([])
for i in range(len(sscores)):
    sfeaturenames = np.append(sfeaturenames, X_train.columns[np.nonzero(stemp == stemp.max())[0][0]])
    orderScore = np.append(orderScore, stemp.max())
    stemp[np.nonzero(stemp == stemp.max())[0][0]] = -1

lowscore = orderScore[feature_numbers]
smask1 = orderScore <= lowscore
smask2 = orderScore > lowscore
ax.bar(sfeaturenames[smask2],orderScore[smask2], align = "center", color = "green")
ax.bar(sfeaturenames[smask1],orderScore[smask1], align = "center", color = "blue")
ax.set_title("SelectFromModel")
ax.tick_params(labelrotation=90)

plt.subplots_adjust(hspace=2, bottom=.2, top= .85)
plt.show()

#selection of the top values to use
Top_Rank = np.array([])
scores = sscores

for i in range(feature_numbers):
    Top_item = scores.max()
    Top_item_loc = np.where(scores == np.max(scores))
    Top_Rank = np.append(Top_Rank,X_train.columns[Top_item_loc])
    scores[Top_item_loc] = 0
print(Top_Rank)
X_train = X_train[Top_Rank]
X_test = X_test[Top_Rank]

推荐答案

由于设置了 dual = False ,因此应该获得相同的系数.您的 sklearn 版本是什么?

Since you set dual=False, you should be getting the same coefficients. What is your sklearn version?

运行此命令并检查是否获得相同的输出:

Run this and check if you get the same output:

from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification

X, y = make_classification(n_features=4, random_state=0)
for i in range(10):
    lsvc = LinearSVC(C=.01, penalty="l1", dual= False).fit(X, y)
    sscores = lsvc.coef_[0]
    print(sscores)

输出应该完全相同.

[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]

这篇关于LinearSVC功能选择在Python中返回不同的coef_的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆