递归功能选择可能不会产生更高的性能吗? [英] Recursive feature selection may not yield higher performance?

查看：95 发布时间：2020/5/4 9:20:46 python machine-learning scikit-learn feature-selection

本文介绍了递归功能选择可能不会产生更高的性能吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正试图分析以下数据，先通过逻辑回归对其进行建模，然后进行预测，然后计算出准确度& auc;然后执行递归特征选择并计算出准确度&再一次，auc认为精度和auc会更高，但实际上在选择了递归特征后它们都较低，不确定是否可以预期吗?还是我错过了什么?谢谢！

I'm tring to analyze below data, modeled it with logistic regression first and then did the prediction, calculated the accuracy & auc; then performed recursive feature selection and calculated accuracy & auc again, thought the accuracy and auc would be higher, but actually they are both lower after the recursive feature selection, not sure whether it's expected? Or did I miss something? Thanks!

数据: https://github.com/amandawang-dev/census-training/blob/master/census-training.csv

对于逻辑回归，准确度:0.8111649491571692； AUC:0.824896256487386

for logistic regression, Accuracy: 0.8111649491571692; AUC: 0.824896256487386

import pandas as pd
import numpy as np
from sklearn import preprocessing, metrics
from sklearn.model_selection import train_test_split


train=pd.read_csv('census-training.csv')
train = train.replace('?', np.nan)
for column in train.columns:
    train[column].fillna(train[column].mode()[0], inplace=True)
x['Income'] = x['Income'].str.contains('>50K').astype(int)
x['Gender'] = x['Gender'].str.contains('Male').astype(int)

obj = train.select_dtypes(include=['object']) #all features that are 'object' datatypes
le = preprocessing.LabelEncoder()
for i in range(len(obj.columns)):
    train[obj.columns[i]] = le.fit_transform(train[obj.columns[i]])#TODO  #Encode input data

train_set, test_set = train_test_split(train, test_size=0.3, random_state=42)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
from sklearn.metrics import accuracy_score


log_rgr = LogisticRegression(random_state=0)


X_train=train_set.iloc[:, 0:9]
y_train=train_set.iloc[:, 9:10]


X_test=test_set.iloc[:, 0:9]
y_test=test_set.iloc[:, 9:10]

log_rgr.fit(X_train, y_train)

y_pred = log_rgr.predict(X_test)

lr_acc = accuracy_score(y_test, y_pred)

probs = log_rgr.predict_proba(X_test)
preds = probs[:,1]
print(preds)
from sklearn.preprocessing import label_binarize
y = label_binarize(y_test, classes=[0, 1]) #note to myself: class need to have only 0,1
fpr, tpr, threshold = metrics.roc_curve(y, preds)

roc_auc = roc_auc_score(y_test, preds)

print("Accuracy: {}".format(lr_acc))
print("AUC: {}".format(roc_auc))

from sklearn.feature_selection import RFE


rfe = RFE(log_rgr, 5)
fit = rfe.fit(X_train, y_train)

X_train_new = fit.transform(X_train)
X_test_new = fit.transform(X_test)

log_rgr.fit(X_train_new, y_train)
y_pred = log_rgr.predict(X_test_new)

lr_acc = accuracy_score(y_test, y_pred)

probs = rfe.predict_proba(X_test)
preds = probs[:,1]
y = label_binarize(y_test, classes=[0, 1]) 

fpr, tpr, threshold = metrics.roc_curve(y, preds)
roc_auc =roc_auc_score(y_test, preds)

print("Accuracy: {}".format(lr_acc))
print("AUC: {}".format(roc_auc))

递归功能选择可能不会产生更高的性能吗? [英] Recursive feature selection may not yield higher performance?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

递归功能选择可能不会产生更高的性能吗? [英] Recursive feature selection may not yield higher performance?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭