Statsmodels Poisson glm与R不同 [英] Statsmodels Poisson glm different than R

查看:86
本文介绍了Statsmodels Poisson glm与R不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据R中提供的某些代码来拟合某些模型(空间交互模型).我已经能够在python框架中使用statsmodels使某些代码工作.完全没有.我相信我为R和Python使用的代码应能得到相同的结果.有没有人看到任何差异?还是有一些根本的差异可能会导致事情失败?R代码是与教程中给出的数字相匹配的原始代码(位于此处:http://www.bartlett.ucl.ac.uk/casa/pdf/paper181 ).

I am trying to fit some models (Spatial interaction models) according to some code which is provided in R. I have been able to get some of the code to work using statsmodels in a python framework but some of them do not match at all. I believe that the code I have for R and Python should give identical results. Does anyone see any differences? Or is there some fundamental differences which might be throwing things off? The R code is the original code which matches the numbers given in a tutorial (Found here: http://www.bartlett.ucl.ac.uk/casa/pdf/paper181).

R示例代码:

library(mosaic)
Data = fetchData('http://dl.dropbox.com/u/8649795/AT_Austria.csv')
Model = glm(Data~Origin+Destination+Dij+offset(log(Offset)), family=poisson(link="log"), data = Data)
cor = cor(Data$Data, Model$fitted, method = "pearson", use = "complete")
rsquared = cor * cor
rsquared

R输出:

> Model = glm(Data~Origin+Destination+Dij+offset(log(Offset)), family=poisson(link="log"), data = Data)
Warning messages:
1: glm.fit: fitted rates numerically 0 occurred 
2: glm.fit: fitted rates numerically 0 occurred 
> cor = cor(Data$Data, Model$fitted, method = "pearson", use = "complete")
> rsquared = cor * cor
> rsquared
[1] 0.9753279

Python代码:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from scipy.stats.stats import pearsonr

Data= pd.DataFrame(pd.read_csv('http://dl.dropbox.com/u/8649795/AT_Austria.csv'))
Model = smf.glm('Data~Origin+Destination+Dij', data=Data, offset=np.log(Data['Offset']), family=sm.families.Poisson(link=sm.families.links.log)).fit()

cor = pearsonr(doubleConstrained.fittedvalues, Data["Data"])[0]

print "R-squared for doubly-constrained model is: " + str(cor*cor)

Python输出:

R-squared for doubly-constrained model is: 0.104758481123

推荐答案

看来,GLM在statsmodels中存在收敛问题.也许在R中也可以,但是R仅给出这些警告.

It looks like GLM has convergence problems here in statsmodels. Maybe in R too, but R only gives these warnings.

Warning messages:
1: glm.fit: fitted rates numerically 0 occurred 
2: glm.fit: fitted rates numerically 0 occurred 

这可能意味着在Logit/Probit上下文中实现完美分离.对于泊松模型,我必须考虑一下.

That could mean something like perfect separation in Logit/Probit context. I'd have to think about it for a Poisson model.

R在告诉您您的配件可能有些问题的情况下,做得更好(即使是微妙的).例如,如果您查看statsmodels中的拟合似然,则为-1.12e27.那里应该有个提示,这应该是个提示.

R is doing a better, if subtle, job of telling you that something may be wrong in your fitting. If you look at the fitted likelihood in statsmodels for instance, it's -1.12e27. That should be a clue right there that something is off.

直接使用泊松模型(在可能的情况下,我总是比GLM更喜欢最大可能性),我可以复制R的结果(但会收到收敛警告).再次说明,默认的newton-raphson求解器失败,所以我使用bfgs.

Using Poisson model directly (I always prefer maximum likelihood to GLM when possible), I can replicate the R results (but I get a convergence warning). Tellingly, again, the default newton-raphson solver fails, so I use bfgs.

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from scipy.stats.stats import pearsonr

data= pd.DataFrame(pd.read_csv('http://dl.dropbox.com/u/8649795/AT_Austria.csv'))

mod = smf.poisson('Data~Origin+Destination+Dij', data=data, offset=np.log(data['Offset'])).fit(method='bfgs')

print mod.mle_retvals['converged']

这篇关于Statsmodels Poisson glm与R不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆