rpy2 的问题,rpart 将数据从 python 正确传递到 r [英] trouble with rpy2, rpart passing data correctly from python to r

查看:41
本文介绍了rpy2 的问题,rpart 将数据从 python 正确传递到 r的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 2.6.5 和 R 10.0 通过 RPY2 运行 rpart.

I am trying to run rpart through RPY2 using Python 2.6.5 and R 10.0.

我在 python 中创建了一个数据框并将其传递,但​​我收到一条错误消息:

I create a data frame in python and pass it along but I get an error stating:

Error in function (x)  : binary operation on non-conformable arrays
Traceback (most recent call last):
  File "partitioningSANDBOX.py", line 86, in <module>
    model=r.rpart(**rpart_params)
  File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 83, in __call__
  File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 35, in __call__
rpy2.rinterface.RRuntimeError: Error in function (x)  : binary operation on non-conformable arrays

谁能帮我确定我做错了什么来抛出这个错误?

Can anyone help me determine what I am doing wrong to throw this error?

我的代码的相关部分是这样的:

the relevant part of my code is this:

import numpy as np
import rpy2
import rpy2.robjects as rob
import rpy2.robjects.numpy2ri


#Fire up the interface to R
r = rob.r
r.library("rpart")

datadict = dict(zip(['responsev','predictorv'],[cLogEC,csplitData]))
Rdata = r['data.frame'](**datadict)
Rformula = r['as.formula']('responsev ~.')
#Generate an RPART model in R.
Rpcontrol = r['rpart.control'](minsplit=10, xval=10)
rpart_params = {'formula' : Rformula, \
       'data' : Rdata,
       'control' : Rpcontrol}
model=r.rpart(**rpart_params)

cLogEC 和 csplitData 这两个变量是 float 类型的 numpy 数组.

The two variables cLogEC and csplitData are numpy arrays of float type.

此外,我的数据框如下所示:

Also, my data frame looks like this:

In [2]: print Rdata
------> print(Rdata)
   responsev predictorv
1  0.6020600        312
2  0.3010300        300
3  0.4771213        303
4  0.4771213        249
5  0.9242793        239
6  1.1986571        297
7  0.7075702        287
8  1.8115750        270
9  0.6020600        296
10 1.3856063        248
11 0.6127839        295
12 0.3010300        283
13 1.1931246        345
14 0.3010300        270
15 0.3010300        251
16 0.3010300        246
17 0.3010300        273
18 0.7075702        252
19 0.4771213        252
20 0.9294189        223
21 0.6127839        252
22 0.7075702        267
23 0.9294189        252
24 0.3010300        378
25 0.3010300        282

公式如下:

In [3]: print Rformula
------> print(Rformula)
responsev ~ .

推荐答案

该问题与 rpart 中的 R 特殊代码有关(准确地说,是以下块,尤其是最后一行:

The problem is related to R idiosyncratic code in rpart (to be precise, the following block, in particular the last line:

m <- match.call(expand.dots = FALSE)
m$model <- m$method <- m$control <- NULL
m$x <- m$y <- m$parms <- m$... <- NULL
m$cost <- NULL
m$na.action <- na.action
m[[1L]] <- as.name("model.frame")
m <- eval(m, parent.frame())

).

解决该问题的一种方法是避免输入该代码块(见下文),或者可能是从 Python 构建嵌套评估(以便 parent.frame() 表现).这并不像人们希望的那么简单,但我可能会在未来找时间让它变得更容易.

One way to work around that is to avoid entering that block of code (see below) or may be to construct a nested evaluation from Python (so that parent.frame() behaves). This is not as simple as one would hope, but may be I'll find time to make it easier in the future.

from rpy2.robjects import DataFrame, Formula
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr
rpart = importr('rpart')
stats = importr('stats')

cLogEC = np.random.uniform(size=10)
csplitData = np.array(range(10), 'i')

dataf = DataFrame({'responsev': cLogEC,
                   'predictorv': csplitData})
formula = Formula('responsev ~.')
rpart.rpart(formula=formula, data=dataf, 
            control=rpart.rpart_control(minsplit = 10, xval = 10),
            model = stats.model_frame(formula, data=dataf))

这篇关于rpy2 的问题,rpart 将数据从 python 正确传递到 r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆