rpy2 代码可以并行运行吗? [英] Can rpy2 code be run in parallel?

查看:50
本文介绍了rpy2 代码可以并行运行吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些 Python 代码通过 rpy2 将数据帧传递给 R,然后 R 处理它,然后我通过 com.load_data 将生成的 data.frame 作为 PANDAS 数据帧拉回 R.

I have some Python code that passes a data frame to R via rpy2, whereupon R processes it and I pull the resulting data.frame back to R as a PANDAS data frame via com.load_data.

问题是,对 com.load_data 的调用在单个 Python 进程中工作正常,但在多个 multiprocessing.Process 中运行同一组代码时它会崩溃同时处理.我从 Python 中收到以下错误消息:

The thing is, the call to com.load_data works fine in a single Python process but it crashes when the same bunch of code is run in several multiprocessing.Process processes concurrently. I get the following error message out of Python:

File "C:\\Python27\\lib\\site-packages\\pandas\\rpy\\common.py", line 29, in load_data
    r.data(name) TypeError: 'DataFrame' object is not callable'

所以我的问题是,rpy2 实际上不是设计能够并行运行的,还是仅仅是 load_data 函数中的一个错误?我只是假设每个 Python 进程都有自己独立的 R 会话.据我所知,唯一的解决方法是让 R 将输出写入文本文件,相应的 Python 进程可以打开该文件并继续处理.但这很笨拙.

So my question is, is rpy2 not actually designed to be able to be run in parallel, or is it merely a bug in the load_data function? I just assumed that each Python process would get its own independent R session. As far as I can tell, the only workaround would be to have R write the output to a text file which the appropriate Python process can open and go on with its processing. But this is pretty clunky.

更新一些代码:

from rpy2.robjects.packages import importr
import rpy2.robjects as ro
import pandas as pd
import pandas.rpy.common as com

# Load C50 library into R environment
C50 = importr('C50')

...

# PANDAS data frame containing test dataset
testing = pd.DataFrame(testing)

# Pass testing dataset to R
rtesting = com.convert_to_r_dataframe(testing)
ro.globalenv['test'] = rtesting

# Strip "AsIs" from each column in the R data frame
# so that predict.C5.0 will work
for c in range(len(testing.columns)):
    ro.r('''class(test[,{0}])=class(test[,{0}])[-match("AsIs", class(test[,{0}]))]'''.format(c+1))

# Make predictions on test dataset (res is pre-existing C5.0 tree)
ro.r('''preds=predict.C5.0(res, newdata=test)''')
ro.r('''preds=as.data.frame(preds)''')

# Get the predictions from R
preds = com.load_data('preds') ### Crashes here when code is run on several processes concurrently

#Further processing as necessary
...

推荐答案

rpy 通过并行运行 Python 进程和 R 进程并在它们之间交换信息来工作.它没有考虑使用 multiprocess 并行调用 R 调用.所以在实践中,每个 python 进程都连接到同一个 R 进程.这可能会导致您看到的问题.

rpy works by running a Python process and an R process in parallel, and exchange information between them. It does not take into account that R calls are called in parallel using multiprocess. So in practice, each of the python processes connects to the same R process. This probably causes the issues you see.

规避此问题的一种方法是在 R 中实现并行处理,而不是在 Python 中.然后,您立即将所有内容发送到 R,这将并行处理它,并将结果发送回 Python.

One way to circumvent this issue is to implement the parallel processing in R, and not in Python. You then send everything at once to R, this will process it in parallel, and the result will be sent back to Python.

这篇关于rpy2 代码可以并行运行吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆