如何找到功能对逻辑回归模型的重要性? [英] How to find the importance of the features for a logistic regression model?

查看:216
本文介绍了如何找到功能对逻辑回归模型的重要性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个通过逻辑回归算法训练的二进制预测模型.我想知道哪些特征(预测变量)对于正或负类的决策更为重要.我知道有coef_参数来自scikit-learn包,但我不知道它是否足以满足要求.另一件事是,我如何根据消极和积极类的重要性来评估coef_值.我还阅读了有关标准化回归系数的文章,但我不知道它是什么.

I have a binary prediction model trained by logistic regression algorithm. I want know which features(predictors) are more important for the decision of positive or negative class. I know there is coef_ parameter comes from the scikit-learn package, but I don't know whether it is enough to for the importance. Another thing is how I can evaluate the coef_ values in terms of the importance for negative and positive classes. I also read about standardized regression coefficients and I don't know what it is.

让我们说,有诸如肿瘤大小,肿瘤重量等特征可决定是恶性还是非恶性的测试病例.我想知道哪些特征对恶性而非恶性预测更重要.有道理吗?

Lets say there are features like size of tumor, weight of tumor, and etc to make a decision for a test case like malignant or not malignant. I want to know which of the features are more important for malignant and not malignant prediction. Does it make sort of sense?

推荐答案

要了解线性分类模型(物流就是其中之一)对给定参数的影响",最简单的选择之一是:考虑其系数的大小乘以数据中相应参数的标准偏差.

One of the simplest options to get a feeling for the "influence" of a given parameter in a linear classification model (logistic being one of those), is to consider the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.

请考虑以下示例:

import numpy as np    
from sklearn.linear_model import LogisticRegression

x1 = np.random.randn(100)
x2 = 4*np.random.randn(100)
x3 = 0.5*np.random.randn(100)
y = (3 + x1 + x2 + x3 + 0.2*np.random.randn()) > 0
X = np.column_stack([x1, x2, x3])

m = LogisticRegression()
m.fit(X, y)

# The estimated coefficients will all be around 1:
print(m.coef_)

# Those values, however, will show that the second parameter
# is more influential
print(np.std(X, 0)*m.coef_)

获得类似结果的另一种方法是检查模型的系数是否适合标准化参数:

An alternative way to get a similar result is to examine the coefficients of the model fit on standardized parameters:

m.fit(X / np.std(X, 0), y)
print(m.coef_)

请注意,这是最基本的方法,并且存在许多其他其他方法来查找特征重要性或参数影响(使用p值,自举分数,各种判别指标"等).

Note that this is the most basic approach and a number of other techniques for finding feature importance or parameter influence exist (using p-values, bootstrap scores, various "discriminative indices", etc).

我很确定您会在 https://stats.stackexchange.com/上获得更多有趣的答案.

I am pretty sure you would get more interesting answers at https://stats.stackexchange.com/.

这篇关于如何找到功能对逻辑回归模型的重要性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆