由于“完美分离错误"，无法运行逻辑回归. [英] Unable to run logistic regression due to "perfect separation error"

查看：925 发布时间：2020/5/4 3:19:48 python numpy pandas matplotlib logistic-regression

本文介绍了由于“完美分离错误"，无法运行逻辑回归.的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是使用Python进行数据分析的初学者，并且在处理此特定任务时遇到了麻烦.我进行了广泛的搜索，但无法确定出什么问题了.

I'm a beginner to data analysis in Python and have been having trouble with this particular assignment. I've searched quite widely, but have not been able to identify what's wrong.

我导入了一个文件并将其设置为数据框.清除了文件中的数据.但是，当我尝试使模型适合数据时，会得到

I imported a file and set it up as a dataframe. Cleaned the data within the file. However, when I try to fit my model to the data, I get a

检测到完美分离，结果不可用

Perfect separation detected, results not available

这是代码:

from scipy import stats
import numpy as np
import pandas as pd 
import collections
import matplotlib.pyplot as plt
import statsmodels.api as sm

loansData = pd.read_csv('https://spark-   public.s3.amazonaws.com/dataanalysis/loansData.csv')

loansData = loansData.to_csv('loansData_clean.csv', header=True, index=False)

## cleaning the file
loansData['Interest.Rate'] = loansData['Interest.Rate'].map(lambda x:  round(float(x.rstrip('%')) / 100, 4))
loanlength = loansData['Loan.Length'].map(lambda x: x.strip('months'))
loansData['FICO.Range'] = loansData['FICO.Range'].map(lambda x: x.split('-'))
loansData['FICO.Range'] = loansData['FICO.Range'].map(lambda x: int(x[0]))
loansData['FICO.Score'] = loansData['FICO.Range']

#add interest rate less than column and populate
## we only care about interest rates less than 12%
loansData['IR_TF'] = pd.Series('', index=loansData.index)
loansData['IR_TF'] = loansData['Interest.Rate'].map(lambda x: True if x < 12 else False)

#create intercept column
loansData['Intercept'] = pd.Series(1.0, index=loansData.index)

# create list of ind var col names
ind_vars = ['FICO.Score', 'Amount.Requested', 'Intercept'] 

#define logistic regression
logit = sm.Logit(loansData['IR_TF'], loansData[ind_vars])

#fit the model
result = logit.fit()

#get fitted coef
coeff = result.params

print coeff

任何帮助将不胜感激！

Thx， A

推荐答案

您拥有PerfectSeparationError，因为您的loanData ['IR_TF']仅具有单个值True(或1).您首先将利率从％转换为小数，因此应将IR_TF定义为

You have PerfectSeparationError because your loansData['IR_TF'] only has a single value True (or 1). You first converted interest rate from % to decimal, so you should define IR_TF as

loansData['IR_TF'] = loansData['Interest.Rate'] < 0.12 #not 12, and you don't need .map

然后您的回归将成功运行:

Then your regression will run successfully:

Optimization terminated successfully.
         Current function value: 0.319503
         Iterations 8
FICO.Score           0.087423
Amount.Requested    -0.000174
Intercept          -60.125045
dtype: float64

此外，我注意到，可以使各个地方更易于阅读和/或获得一些性能改进，特别是.map可能不如矢量化计算那么快.这是我的建议:

Also, I noticed various places that can be made easier to read and/or gain some performance improvements in particular .map might not be as fast as vectorized calculations. Here are my suggestions:

from scipy import stats
import numpy as np
import pandas as pd 
import collections
import matplotlib.pyplot as plt
import statsmodels.api as sm

loansData = pd.read_csv('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv')

## cleaning the file
loansData['Interest.Rate'] = loansData['Interest.Rate'].str.rstrip('%').astype(float).round(2) / 100.0

loanlength = loansData['Loan.Length'].str.strip('months')#.astype(int)  --> loanlength not used below

loansData['FICO.Score'] = loansData['FICO.Range'].str.split('-', expand=True)[0].astype(int)

#add interest rate less than column and populate
## we only care about interest rates less than 12%
loansData['IR_TF'] = loansData['Interest.Rate'] < 0.12

#create intercept column
loansData['Intercept'] = 1.0

# create list of ind var col names
ind_vars = ['FICO.Score', 'Amount.Requested', 'Intercept'] 

#define logistic regression
logit = sm.Logit(loansData['IR_TF'], loansData[ind_vars])

#fit the model
result = logit.fit()

#get fitted coef
coeff = result.params

#print coeff
print result.summary() #result has more information


Logit Regression Results                           
==============================================================================
Dep. Variable:                  IR_TF   No. Observations:                 2500
Model:                          Logit   Df Residuals:                     2497
Method:                           MLE   Df Model:                            2
Date:                Thu, 07 Jan 2016   Pseudo R-squ.:                  0.5243
Time:                        23:15:54   Log-Likelihood:                -798.76
converged:                       True   LL-Null:                       -1679.2
                                        LLR p-value:                     0.000
====================================================================================
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
FICO.Score           0.0874      0.004     24.779      0.000         0.081     0.094
Amount.Requested    -0.0002    1.1e-05    -15.815      0.000        -0.000    -0.000
Intercept          -60.1250      2.420    -24.840      0.000       -64.869   -55.381
====================================================================================

顺便说一句-这是P2P借出数据吗?

By the way -- is this P2P lending data?

这篇关于由于“完美分离错误"，无法运行逻辑回归.的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

由于“完美分离错误"，无法运行逻辑回归. [英] Unable to run logistic regression due to "perfect separation error"

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

由于“完美分离错误"，无法运行逻辑回归. [英] Unable to run logistic regression due to &quot;perfect separation error&quot;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

由于“完美分离错误"，无法运行逻辑回归. [英] Unable to run logistic regression due to "perfect separation error"

登录关闭