Rpy2:pandas 数据框不适合 R [英] Rpy2: pandas dataframe can't fit in R

查看:71
本文介绍了Rpy2:pandas 数据框不适合 R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用 python 读取一个 csv 文件(到一个 Pandas 数据帧中),在 R 中工作并返回到 python.然后,为了将 Pandas 数据帧传递给 R 数据帧,我使用 rpy2,并且工作正常(代码如下).

I need to read a csv file with python (into a pandas dataframe), work in R and return to python. Then, to pass pandas dataframe to R dataframe I use rpy2, and work ok (code bellow).

from pandas import read_csv, DataFrame
import pandas.rpy.common as com
import rpy2.robjects as robjects

r = robjects.r
r.library("fitdistrplus")

df = read_csv('./datos.csv')
r_df = com.convert_to_r_dataframe(df)
print(type(r_df))

这个输出是:

<class 'rpy2.robjects.vectors.FloatVector'>

但是,我尝试适应 R:

But then, I try to make a fit in R:

fit2 = r.fitdist(r_df, "weibull")

但是我有这个错误:

RRuntimeError: Error in (function (data, distr, method = c("mle", "mme", "qme", "mge"),  : 
data must be a numeric vector of length greater than 1

我有第二个问题:
1_我做错了什么?
2_这是将python数据帧传递给R的最有效方法吗?因为,我看到这个导入:from rpy2.robjects.packages import importr

这是我读到的数据:https://mega.co.nz/#!P8MEDSzQ!iQyxt73a5pRvJNOxWeSEaFlsVS7_A1sZCAXkUFBLJa0

我使用 Ipython 2.1 谢谢!

I use Ipython 2.1 Thanks!

推荐答案

您有两个问题:

首先,您尝试使用真正需要向量的数据框.(如果您尝试将 R data.frame 用于 fitdist(),您也会收到错误.)

First, you are trying to use a data frame where you really need a vector. (If you tried using an R data.frame for fitdist(), you'd also get an error.)

其次,pandas 提供的 pandas<->rpy2 支持存在问题,导致您的(大概)数字 Pandas 数据框转换为字符串/字符 R 数据框:

Second, the pandas<->rpy2 support provided by pandas is buggy, resulting in conversion of your (presumably) numeric pandas data frame to a string/character R data frame:

In [27]: r.sapply(r_df, r["class"])
Out[27]: 
<StrVector - Python:0x1097757a0 / R:0x7fa41c6b0b68>
[str, str, str, str]

这不好!以下代码修复了这些错误:

This is not good! The following code fixes these errors:

from pandas import read_csv
import rpy2.robjects as robjects

r = robjects.r
r.library("fitdistrplus")

# this will read in your csv file as a Series, rather than a DataFrame
series = read_csv('datos.csv', index_col=0, squeeze=True)

# do the conversion directly, so that we get an R Vector, rather than a 
# data frame, and we know that it's a numeric type
r_vec = robjects.FloatVector(series)

fit2 = r.fitdist(r_vec, "weibull")

这篇关于Rpy2:pandas 数据框不适合 R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆