使用pandas数据框进行rpy2回归的最小示例 [英] Minimal example of rpy2 regression using pandas data frame

查看:96
本文介绍了使用pandas数据框进行rpy2回归的最小示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用pandas数据框进行线性回归的推荐方法(如果有)是什么?我可以做到,但是我的方法似乎很复杂.我会使事情变得不必要地复杂吗?

R代码,以供比较:

x <- c(1,2,3,4,5)
y <- c(2,1,3,5,4)
M <- lm(y~x)
summary(M)$coefficients
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

现在,我的python(2.7.10),rpy2(2.6.0)和熊猫(0.16.1) 版本:

import pandas
import pandas.rpy.common as common
from rpy2 import robjects
from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')

dataframe = pandas.DataFrame({'x': [1,2,3,4,5], 
                              'y': [2,1,3,5,4]})

robjects.globalenv['dataframe']\
   = common.convert_to_r_dataframe(dataframe) 

M = stats.lm('y~x', data=base.as_symbol('dataframe'))

print(base.summary(M).rx2('coefficients'))

            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

顺便说一句,我确实导入了pandas.rpy.common的FutureWarning.但是,当我尝试pandas2ri.py2ri(dataframe)将数据帧从熊猫转换为R时(如此处),我得到

NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.series.Series'>'

解决方案

R和Python并不完全相同,因为您在Python/rpy2中构建了数据框,而在R中使用了向量(无数据框).

否则,使用rpy2进行的转换似乎在这里起作用:

from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))

结果:

>>> print(base.summary(M).rx2('coefficients'))
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

What is the recommended way (if any) for doing linear regression using a pandas dataframe? I can do it, but my method seems very elaborate. Am I making things unnecessarily complicated?

The R code, for comparison:

x <- c(1,2,3,4,5)
y <- c(2,1,3,5,4)
M <- lm(y~x)
summary(M)$coefficients
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

Now, my python (2.7.10), rpy2 (2.6.0), and pandas (0.16.1) version:

import pandas
import pandas.rpy.common as common
from rpy2 import robjects
from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')

dataframe = pandas.DataFrame({'x': [1,2,3,4,5], 
                              'y': [2,1,3,5,4]})

robjects.globalenv['dataframe']\
   = common.convert_to_r_dataframe(dataframe) 

M = stats.lm('y~x', data=base.as_symbol('dataframe'))

print(base.summary(M).rx2('coefficients'))

            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

By the way, I do get a FutureWarning on the import of pandas.rpy.common. However, when I tried the pandas2ri.py2ri(dataframe) to convert a dataframe from pandas to R (as mentioned here), I get

NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.series.Series'>'

解决方案

The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

Otherwise, the conversion shipping with rpy2 appears to be working here:

from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))

The result:

>>> print(base.summary(M).rx2('coefficients'))
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

这篇关于使用pandas数据框进行rpy2回归的最小示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆