在python中使用statsmodels错误进行逻辑回归 [英] logistic regression using statsmodels error in python

查看:59
本文介绍了在python中使用statsmodels错误进行逻辑回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 statsmodels 实现逻辑回归(我需要摘要),但出现此错误:

I am trying to implement a logistic regression using statsmodels (I need the summary) and I get this error:

LinAlgError: Singular matrix

我的df是数字的并且是相关的,因此我删除了非数字和常量特征.由于相关的功能,我尝试实现常规回归以及具有l1罚分(没有l2)的回归.

My df is numeric and correlated, I deleted the non-numeric and constant features. I tried to implement regular regression as well as one with l1 penalty (l2 isn't available) because of the correlated features.

我试图检查矩阵等级并得到以下打印结果:

I tried to check the matrix rank and got this print:

print(len(df.columns)) -> 156

print(np.linalg.matrix_rank(df.values)) -> 151

我怎么知道哪些功能有问题,为什么?

How do I know which features are a problem and why?

我的代码:

logit = sm.Logit(y,X)

result = logit.fit_regularized(trim_mode='auto', alpha=0,maxiter=150)

print(result.summary())

更新:

删除高度相关的功能后,我得到:

after removing highly correlated features I get:

  len(df.columns) =  np.linalg.matrix_rank(df.values)

,但仍然是相同的错误.(即使我设置了较低的相关阈值).

but still the same error. (even if I set a low correlation threshold).

我也尝试过更改求解器.

I tried to change the solver as well.

推荐答案

如注释中所建议,如果两个功能完全相关,则该模型将无法运行.如果您的熊猫数据框的列数很少,最简单的检查方法是调用

As suggested in the comments, if two features are exactly correlated the model won't run. The easiest way to check this if you have a pandas dataframe with a small number of columns is to call the .corr() method on your dataframe - in this case df.corr(), and check if any pair of features have correlation =1.

您应该真正考虑一下为什么某些功能是完全相关的.

You should really think about why some features are perfectly correlated though.

这篇关于在python中使用statsmodels错误进行逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆