foreach %dopar% + RPostgreSQL [英] foreach %dopar% + RPostgreSQL

查看:17
本文介绍了foreach %dopar% + RPostgreSQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 RPostgreSQL 连接到本地数据库.该设置在我的 Linux 机器上运行良好.R 2.11.1,Postgres 8.4.

I am using RPostgreSQL to connect to a local database. The setup works just fine on my Linux machine. R 2.11.1, Postgres 8.4.

我正在使用多核 (doMC) 并行后端的foreach"来包装一些重复查询(数以千计)并将结果附加到数据结构中.奇怪的是,如果我使用 %do% 它会起作用,但是当我切换到 %dopar% 时它会失败,只有一次迭代时例外(如下所示)

I was playing with the 'foreach' with the multicore (doMC) parallel backend to wrap some repetitive queries (numbering a few thousand) and appending the results into a data structure. Curiously enough, it works if I use %do% but fails when I switch to %dopar%, with the exception when there is only one iteration (as shown below)

我想知道它是否与单个连接对象有关,所以我创建了 10 个连接对象,并根据 'i' 是什么,为该查询提供了一个特定的 con 对象,具体取决于 i 模 10.(指示下面只有 2 个连接对象).求值的表达式 eval(expr.01) 包含/是取决于i"是什么的查询.

I wondered whether it had something to do with a single connection object, so I created 10 connection objects and depending on what 'i' was, a certain con object was given for that query, depending on i modulo 10. (indicated below by just 2 connection objects). The expression which is evaluated eval(expr.01), contains/is the query which depends on what 'i' is.

我无法理解这些特定的错误消息.我想知道是否有任何方法可以使这项工作.

I can't make sense of these particular error messages. I am wondering whether there is any way to make this work.

谢谢.
维沙尔·贝尔萨雷

Thanks.
Vishal Belsare

R 片段如下:

> id.qed2.foreach <- foreach(i = 1588:1588, .inorder=FALSE) %dopar% { 
+ if (i %% 2 == 0) {con <- con0}; 
+ if (i %% 2 == 1) {con <- con1}; 
+ fetch(dbSendQuery(con,eval(expr.01)),n=-1)$idreuters};
> id.qed2.foreach
[[1]]
  [1]   411   414  2140  2406  4490  4507  4519  4570  4571  4572  4703  4731
[109] 48765 84312 91797

> id.qed2.foreach <- foreach(i = 1588:1589, .inorder=FALSE) %dopar% { 
+ if (i %% 2 == 0) {con <- con0}; 
+ if (i %% 2 == 1) {con <- con1}; 
+ fetch(dbSendQuery(con,eval(expr.01)),n=-1)$idreuters};
Error in stop(paste("expired", class(con))) : 
  no function to return from, jumping to top level
Error in stop(paste("expired", class(con))) : 
  no function to return from, jumping to top level
Error in { : 
  task 1 failed - "error in evaluating the argument 'res' in selecting a method for function 'fetch'"
> 

我改变了一些东西,(仍然不成功),但有一些东西被曝光.在循环中创建的连接对象并且没有通过 dbDisconnect '断开'连接,这会导致挂起连接,正如 Postgres 的/var/log 所证明的那样.执行此操作时会出现一些新的错误消息:

I changed a few things, (still unsuccessful), but a few things come to light. Connection objects made in the loop and not 'disconnected' via dbDisconnect, lead to hanging connections as evident by the /var/log for Postgres. A few new error messages show up when I do this:

> system.time(
+ id.qed2.foreach <- foreach(i = 1588:1590, .inorder=FALSE, 
.packages=c("DBI", "RPostgreSQL")) %dopar% {drv0 <- dbDriver("PostgreSQL"); 
con0 <- dbConnect(drv0, dbname='nseindia');
list(idreuters=fetch(dbSendQuery(con0,eval(expr.01)),n=-1)$idreuters);
dbDisconnect(con0)})
Error in postgresqlExecStatement(conn, statement, ...) : 
  no function to return from, jumping to top level
Error in postgresqlExecStatement(conn, statement, ...) : 
  no function to return from, jumping to top level
Error in postgresqlExecStatement(conn, statement, ...) : 
  no function to return from, jumping to top level
Error in { : 
  task 1 failed - "error in evaluating the argument 'res' in selecting a method for function 'fetch'"

推荐答案

以下的工作和速度比顺序形式提高了大约 1.5 倍.作为下一步,我想知道是否可以将连接对象附加到由 registerDoMC 产生的每个工作人员.如果是这样,则无需创建/销毁连接对象,从而防止连接使 PostgreSQL 服务器不堪重负.

The following works and speeds up by ~ 1.5x over a sequential form. As a next step, I am wondering whether it is possible to attach a connection object to each of the workers spawned by registerDoMC. If so, then there would be no need to create/destroy the connection objects, which prevents from overwhelming the PostgreSQL server with connections.

pgparquery <- function(i) {
drv <- dbDriver("PostgreSQL"); 
con <- dbConnect(drv, dbname='nsdq'); 
lst <- eval(expr.01); #contains the SQL query which depends on 'i'
qry <- dbSendQuery(con,lst);
tmp <- fetch(qry,n=-1);
dt <- dates.qed2[i]
dbDisconnect(con);
result <- list(date=dt, idreuters=tmp$idreuters)
return(result)}

id.qed.foreach <- foreach(i = 1588:3638, .inorder=FALSE, .packages=c("DBI", "RPostgreSQL")) %dopar% {pgparquery(i)}

--
维沙尔·贝尔萨雷

--
Vishal Belsare

这篇关于foreach %dopar% + RPostgreSQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆