在python中使用statsmodels错误进行逻辑回归 [英] logistic regression using statsmodels error in python
问题描述
我正在尝试使用 statsmodels 实现逻辑回归(我需要摘要),但出现此错误:
I am trying to implement a logistic regression using statsmodels (I need the summary) and I get this error:
LinAlgError: Singular matrix
我的df是数字的并且是相关的,因此我删除了非数字和常量特征.由于相关的功能,我尝试实现常规回归以及具有l1罚分(没有l2)的回归.
My df is numeric and correlated, I deleted the non-numeric and constant features. I tried to implement regular regression as well as one with l1 penalty (l2 isn't available) because of the correlated features.
我试图检查矩阵等级并得到以下打印结果:
I tried to check the matrix rank and got this print:
print(len(df.columns)) -> 156
print(np.linalg.matrix_rank(df.values)) -> 151
我怎么知道哪些功能有问题,为什么?
How do I know which features are a problem and why?
我的代码:
logit = sm.Logit(y,X)
result = logit.fit_regularized(trim_mode='auto', alpha=0,maxiter=150)
print(result.summary())
更新:
删除高度相关的功能后,我得到:
after removing highly correlated features I get:
len(df.columns) = np.linalg.matrix_rank(df.values)
,但仍然是相同的错误.(即使我设置了较低的相关阈值).
but still the same error. (even if I set a low correlation threshold).
我也尝试过更改求解器.
I tried to change the solver as well.
推荐答案
如注释中所建议,如果两个功能完全相关,则该模型将无法运行.如果您的熊猫数据框的列数很少,最简单的检查方法是调用
As suggested in the comments, if two features are exactly correlated the model won't run. The easiest way to check this if you have a pandas dataframe with a small number of columns is to call the .corr() method on your dataframe - in this case df.corr(), and check if any pair of features have correlation =1.
您应该真正考虑一下为什么某些功能是完全相关的.
You should really think about why some features are perfectly correlated though.
这篇关于在python中使用statsmodels错误进行逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!