Logistic回归作为回归的Python实现(不是分类!) [英] Python Implementation of Logistic Regression as Regression (Not Classification!)

查看:93
本文介绍了Logistic回归作为回归的Python实现(不是分类!)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个回归问题,我想使用逻辑回归-而不是逻辑分类-因为我的目标变量 y 是介于0和1之间的连续量.Python似乎完全是逻辑分类.我还研究了GLM实现,似乎都没有实现S型链接功能.有人可以指出我将Logistic回归作为回归算法的Python实现的方向.

解决方案

在statsmodels中,带值二项式族的GLM和离散模型Logit都允许连续的目标变量,只要值限制在[0,1]区间即可./p>

类似地,泊松对于非负值连续数据建模非常有用.

在这些情况下,由于分布假设不正确,因此通过准最大似然QMLE而不是MLE来估计模型.但是,我们可以正确地(一致地)估计均值函数.推论需要基于错误指定的鲁棒标准错误,这些错误可通过 fit 选项 cov_type ="HC0&"

获得

这是一个带有示例的笔记本 https://www.statsmodels.org/dev/examples/notebooks/generate/quasibinomial.html

QMLE和小数Logit背景的一些问题 https://www.github.com/statsmodels/statsmodels/issues/2040 QMLE https://github.com/statsmodels/statsmodels/issues/2712

参考

Papke,L.E.和Wooldridge,J.M.(1996),分数响应变量的计量经济学方法,适用于401(k)计划参与率.J.应用经济,11:619-632. https://doi.org/10.1002/(SICI)1099-1255(199611)11:6 < 619 :: AID-JAE418> 3.0.CO; 2-1

更新和警告

自statsmodels 0.12起

再对此进行调查,我发现离散Probit不支持连续间隔数据.它使用一种计算快捷方式,该快捷方式假定因变量的值为0或1.但是,在这种情况下不会引发异常. https://github.com/statsmodels/statsmodels/issues/7210

离散Logit使用优化方法牛顿"对连续数据正确工作.对数似然函数本身使用与Probit类似的计算捷径,但不使用Logit的派生和其他部分.

GLM-Binomial设计用于间隔数据,并且没有任何问题.目前唯一的数值精度问题是在使用数值导数的 probit 链接的 Hessian 中,并且不是很精确,这意味着参数被很好地估计,但标准误差在 GLM-Probit 中可能存在数值噪声.

更新statsmodels 0.12.2中的两个更改:
如果响应不是整数值,则Probit现在引发异常,并且
具有Probit链接的GLM Binomial使用改进的Hessian导数,其精度现在类似于离散Probit.

I have a regression problem on which I want to use logistic regression - not logistic classification - because my target variables y are continuopus quantities between 0 and 1. However, the common implementations of logistic regression in Python seem to be exclusively logistic classification. I've also looked at GLM implementations and none seem to have implemented a sigmoid link function. Can someone point me in the direction of a Python implementation of logistic regression as a regression algorithm.

解决方案

In statsmodels both GLM with family Binomial and discrete model Logit allow for a continuous target variable as long as the values are restricted to interval [0, 1].

Similarly, Poisson is very useful to model non-negative valued continuous data.

In these cases, the model is estimated by quasi maximum likelihood, QMLE, and not by MLE, because the distributional assumptions are not correct. Nevertheless, we can correctly (consistently) estimate the mean function. Inference needs to be based on misspecification robust standard errors which are available as fit option cov_type="HC0"

Here is a notebook with example https://www.statsmodels.org/dev/examples/notebooks/generated/quasibinomial.html

some issues with background for QMLE and fractional Logit https://www.github.com/statsmodels/statsmodels/issues/2040 QMLE https://github.com/statsmodels/statsmodels/issues/2712

Reference

Papke, L.E. and Wooldridge, J.M. (1996), Econometric methods for fractional response variables with an application to 401(k) plan participation rates. J. Appl. Econ., 11: 619-632. https://doi.org/10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1

update and Warning

as of statsmodels 0.12

Investigating this some more, I found that discrete Probit does not support continuous interval data. It uses a computational shortcut that assumes that the values of the dependent variable are either 0 or 1. However, it does not raise an exception in this case. https://github.com/statsmodels/statsmodels/issues/7210

Discrete Logit works correctly for continuous data with optimization method "newton". The loglikelihood function itself uses a similar computational shortcut as Probit, but not the derivatives and other parts of Logit.

GLM-Binomial is designed for interval data and has no problems with it. The only numerical precision problems are currently in the Hessian of the probit link that uses numerical derivatives and is not very precise, which means that the parameters are well estimated but standard error can have numerical noise in GLM-Probit.

update Two changes in statsmodels 0.12.2:
Probit now raises exception if response is not integer valued, and
GLM Binomial with Probit link uses improved derivatives for Hessian with precision now similar to discrete Probit.

这篇关于Logistic回归作为回归的Python实现(不是分类!)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆