如何使用 scikit-learn 获得优势比和其他相关特征 [英] How to get odds-ratios and other related features with scikit-learn

查看:53
本文介绍了如何使用 scikit-learn 获得优势比和其他相关特征的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在经历这个逻辑回归教程中的比值比,并尝试使用 scikit-learn 的逻辑回归模块获得完全相同的结果.使用下面的代码,我能够获得系数和截距,但我找不到找到教程中列出的模型的其他属性的方法,例如log-likelyhood、Odds Ratio、Std.Err., z, P>|z|, [95% Conf.间隔].如果有人能告诉我如何使用 sklearn 包计算它们,我将不胜感激.

I'm going through this odds ratios in logistic regression tutorial, and trying to get the exactly the same results with the logistic regression module of scikit-learn. With the code below, I am able to get the coefficient and intercept but I could not find a way to find other properties of the model listed in the tutorial such as log-likelyhood, Odds Ratio, Std. Err., z, P>|z|, [95% Conf. Interval]. If someone could show me how to have them calculated with sklearn package, I would appreciate it.

import pandas as pd
from sklearn.linear_model import LogisticRegression

url = 'https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sample.csv'
df = pd.read_csv(url, na_values=[''])
y = df.hon.values
X = df.math.values
y = y.reshape(200,1)
X = X.reshape(200,1)
clf = LogisticRegression(C=1e5)
clf.fit(X,y)
clf.coef_
clf.intercept_

推荐答案

您可以通过取系数的指数来获得优势比:

You can get the odds ratios by taking the exponent of the coeffecients:

import numpy as np
X = df.female.values.reshape(200,1)
clf.fit(X,y)
np.exp(clf.coef_)

# array([[ 1.80891307]])

至于其他统计数据,这些不容易从 scikit-learn 获得(其中模型评估主要使用交叉验证完成),如果您需要它们,最好使用不同的库,例如 statsmodels.

As for the other statistics, these are not easy to get from scikit-learn (where model evaluation is mostly done using cross-validation), if you need them you're better off using a different library such as statsmodels.

这篇关于如何使用 scikit-learn 获得优势比和其他相关特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆