在python中计算逻辑回归 [英] Calculate logistic regression in python

查看:236
本文介绍了在python中计算逻辑回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图计算逻辑回归.我有数据作为csv文件. 看起来像

I tried to calculate logical regression. I have the data as csv file. it looks like

node_id,second_major,gender,major_index,year,dorm,high_school,student_fac
0,0,2,257,2007,111,2849,1
1,0,2,271,2005,0,51195,2
2,0,2,269,2007,0,21462,1
3,269,1,245,2008,111,2597,1
..........................

这是我的编码.

import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np

df = pd.read_csv("Reed98.csv")
print df.describe()

dummy_ranks = pd.get_dummies(df['second_major'], prefix='second_major')

cols_to_keep = ['second_major', 'dorm', 'high_school']
data = df[cols_to_keep].join(dummy_ranks.ix[:, 'year':])
train_cols = data.columns[1:]
# Index([gre, gpa, prestige_2, prestige_3, prestige_4], dtype=object)

logit = sm.Logit(data['second_major'], data[train_cols])
result = logit.fit()

print result.summary()

当我在python中运行编码时,出现错误:

When I run the coding in python I got an error:

Traceback (most recent call last):
File "D:\project\logisticregression.py", line 24, in <module>
result = logit.fit()
File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6-         win32.egg\statsmodels\discrete\discrete_model.py", line 282, in fit
 disp=disp, callback=callback, **kwargs)
 File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6-   win32.egg\statsmodels\discrete\discrete_model.py", line 233, in fit
 disp=disp, callback=callback, **kwargs)
 File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6-   win32.egg\statsmodels\base\model.py", line 291, in fit
 hess=hess)
 File "c:\python26\lib\site-packages\statsmodels-0.5.0-py2.6-win32.egg\statsmodels\base\model.py", line 341, in _fit_mle_newton
newparams = oldparams - np.dot(np.linalg.inv(H),
File "C:\Python26\Lib\site-packages\numpy\linalg\linalg.py", line 445, in inv
 return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
 File "C:\Python26\Lib\site-packages\numpy\linalg\linalg.py", line 328, in solve
 raise LinAlgError('Singular matrix')
 LinAlgError: Singular matrix

如何重写代码?

推荐答案

您的代码没有错.我的猜测是您的数据中缺少值.尝试使用dropna或使用missing='drop'进行Logit.您可能还会检查右侧是否为完整等级np.linalg.matrix_rank(data[train_cols].values)

There's nothing wrong with your code. My guess is that you have missing values in your data. Try a dropna or use missing='drop' to Logit. You might also check that the right hand side is full rank np.linalg.matrix_rank(data[train_cols].values)

这篇关于在python中计算逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆