如何使用 scikit-learn 获得优势比和其他相关特征 [英] How to get odds-ratios and other related features with scikit-learn
问题描述
我正在经历这个逻辑回归教程中的比值比,并尝试使用 scikit-learn 的逻辑回归模块获得完全相同的结果.使用下面的代码,我能够获得系数和截距,但我找不到找到教程中列出的模型的其他属性的方法,例如log-likelyhood、Odds Ratio、Std.Err., z, P>|z|, [95% Conf.间隔].如果有人能告诉我如何使用 sklearn
包计算它们,我将不胜感激.
I'm going through this odds ratios in logistic regression tutorial, and trying to get the exactly the same results with the logistic regression module of scikit-learn. With the code below, I am able to get the coefficient and intercept but I could not find a way to find other properties of the model listed in the tutorial such as log-likelyhood, Odds Ratio, Std. Err., z, P>|z|, [95% Conf. Interval]. If someone could show me how to have them calculated with sklearn
package, I would appreciate it.
import pandas as pd
from sklearn.linear_model import LogisticRegression
url = 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sample.csv'
df = pd.read_csv(url, na_values=[''])
y = df.hon.values
X = df.math.values
y = y.reshape(200,1)
X = X.reshape(200,1)
clf = LogisticRegression(C=1e5)
clf.fit(X,y)
clf.coef_
clf.intercept_
推荐答案
您可以通过取系数的指数来获得优势比:
You can get the odds ratios by taking the exponent of the coeffecients:
import numpy as np
X = df.female.values.reshape(200,1)
clf.fit(X,y)
np.exp(clf.coef_)
# array([[ 1.80891307]])
至于其他统计数据,这些不容易从 scikit-learn 获得(其中模型评估主要使用交叉验证完成),如果您需要它们,最好使用不同的库,例如 statsmodels.
As for the other statistics, these are not easy to get from scikit-learn (where model evaluation is mostly done using cross-validation), if you need them you're better off using a different library such as statsmodels.
这篇关于如何使用 scikit-learn 获得优势比和其他相关特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!