如何获得特定预测的逻辑回归特征的相对重要性? [英] How can I get the relative importance of features of a logistic regression for a particular prediction?

查看：850 发布时间：2020/5/4 3:18:03 machine-learning scikit-learn logistic-regression feature-selection coefficients

本文介绍了如何获得特定预测的逻辑回归特征的相对重要性?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Logistic回归(在scikit中)来解决二进制分类问题，并且对能够解释每个单独的预测感兴趣.更准确地说，我对预测阳性类别的可能性感兴趣，并希望衡量每种功能对该预测的重要性.

I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and having a measure of the importance of each feature for that prediction.

使用系数(Beta)作为重要程度通常是一个坏主意如此处回答，但我还没有找到一个好的选择.

Using the coefficients (Betas) as a measure of importance is generally a bad idea as answered here, but I'm yet to find a good alternative.

到目前为止，我发现的最好的是以下3个选项:

So far the best I have found are the following 3 options:

Monte Carlo Option :修复所有其他功能，然后重新运行预测，并用训练集中的随机样本替换我们要评估的功能.多次执行此操作.这将为阳性类别建立基线概率.然后将其与原始运行的正类概率进行比较.区别在于特征的重要性.
一劳永逸"分类器:要评估某个功能的重要性，请先创建一个使用所有功能的模型，然后再创建一个使用所有功能的模型，但所测试的功能除外.使用这两种模型预测新的观测值.两者之间的区别在于该功能的重要性.
调整后的测试版:基于

Monte Carlo Option: Fixing all other features, re-run the prediction replacing the feature we want to evaluate with random samples from the training set. Do this a large number of times. This would establish a baseline probability for the positive class. Then compare with the probability of the positive class of the original run. The difference is a measure of Importance of the feature.
"Leave-one-out" classifiers: To evaluate the importance of a feature, first create a model which uses all features, and then another that uses all features except the one being tested. Predict the new observation using both models. The difference between the two would be the importance of the feature.
Adjusted betas: Based on this answer, ranking the importance of the features by 'the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.'

对我来说，所有选项(使用Beta，Monte Carlo和"Leave-one-out")似乎都是糟糕的解决方案.

All options (using betas, Monte Carlo and "Leave-one-out") seem like poor solutions to me.

蒙特卡洛(Monte Carlo)取决于训练集的分布，我找不到任何文献来支持它.
遗漏"很容易被两个相关的特征所欺骗(当一个不存在时，另一个将介入以进行补偿，并且两个都将被赋予0重要性).
调整后的beta听起来很合理，但是我找不到任何文献来支持它.

实际问题:在做出决策时，使用线性分类器来解释每个功能的重要性的最佳方法是什么?

Actual question: What is the best way to interpret the importance of each feature, at the moment of a decision, with a linear classifier?

快速注释#1:对于随机森林这是微不足道的，我们可以简单地使用prediction + bias分解，如

Quick note #1: for Random Forests this is trivial, we can simply use the prediction + bias decomposition, as explained beautifully in this blog post. The problem here is how to do something similar with linear classifiers such as Logistic Regression.

快速注释#2:关于stackoverflow有很多相关问题( 2 3 5 ).我无法找到这个特定问题的答案.

Quick note #2: there are a number of related questions on stackoverflow (1 2 3 4 5). I have not been able to find an answer to this specific question.

如何获得特定预测的逻辑回归特征的相对重要性? [英] How can I get the relative importance of features of a logistic regression for a particular prediction?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何获得特定预测的逻辑回归特征的相对重要性? [英] How can I get the relative importance of features of a logistic regression for a particular prediction?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭