foreach %dopar% + RPostgreSQL [英] foreach %dopar% + RPostgreSQL
问题描述
我正在使用 RPostgreSQL 连接到本地数据库.该设置在我的 Linux 机器上运行良好.R 2.11.1,Postgres 8.4.
I am using RPostgreSQL to connect to a local database. The setup works just fine on my Linux machine. R 2.11.1, Postgres 8.4.
我正在使用多核 (doMC) 并行后端的foreach"来包装一些重复查询(数以千计)并将结果附加到数据结构中.奇怪的是,如果我使用 %do% 它会起作用,但是当我切换到 %dopar% 时它会失败,只有一次迭代时例外(如下所示)
I was playing with the 'foreach' with the multicore (doMC) parallel backend to wrap some repetitive queries (numbering a few thousand) and appending the results into a data structure. Curiously enough, it works if I use %do% but fails when I switch to %dopar%, with the exception when there is only one iteration (as shown below)
我想知道它是否与单个连接对象有关,所以我创建了 10 个连接对象,并根据 'i' 是什么,为该查询提供了一个特定的 con 对象,具体取决于 i 模 10.(指示下面只有 2 个连接对象).求值的表达式 eval(expr.01) 包含/是取决于i"是什么的查询.
I wondered whether it had something to do with a single connection object, so I created 10 connection objects and depending on what 'i' was, a certain con object was given for that query, depending on i modulo 10. (indicated below by just 2 connection objects). The expression which is evaluated eval(expr.01), contains/is the query which depends on what 'i' is.
我无法理解这些特定的错误消息.我想知道是否有任何方法可以使这项工作.
I can't make sense of these particular error messages. I am wondering whether there is any way to make this work.
谢谢.
维沙尔·贝尔萨雷
Thanks.
Vishal Belsare
R 片段如下:
> id.qed2.foreach <- foreach(i = 1588:1588, .inorder=FALSE) %dopar% {
+ if (i %% 2 == 0) {con <- con0};
+ if (i %% 2 == 1) {con <- con1};
+ fetch(dbSendQuery(con,eval(expr.01)),n=-1)$idreuters};
> id.qed2.foreach
[[1]]
[1] 411 414 2140 2406 4490 4507 4519 4570 4571 4572 4703 4731
[109] 48765 84312 91797
> id.qed2.foreach <- foreach(i = 1588:1589, .inorder=FALSE) %dopar% {
+ if (i %% 2 == 0) {con <- con0};
+ if (i %% 2 == 1) {con <- con1};
+ fetch(dbSendQuery(con,eval(expr.01)),n=-1)$idreuters};
Error in stop(paste("expired", class(con))) :
no function to return from, jumping to top level
Error in stop(paste("expired", class(con))) :
no function to return from, jumping to top level
Error in { :
task 1 failed - "error in evaluating the argument 'res' in selecting a method for function 'fetch'"
>
我改变了一些东西,(仍然不成功),但有一些东西被曝光.在循环中创建的连接对象并且没有通过 dbDisconnect '断开'连接,这会导致挂起连接,正如 Postgres 的/var/log 所证明的那样.执行此操作时会出现一些新的错误消息:
I changed a few things, (still unsuccessful), but a few things come to light. Connection objects made in the loop and not 'disconnected' via dbDisconnect, lead to hanging connections as evident by the /var/log for Postgres. A few new error messages show up when I do this:
> system.time(
+ id.qed2.foreach <- foreach(i = 1588:1590, .inorder=FALSE,
.packages=c("DBI", "RPostgreSQL")) %dopar% {drv0 <- dbDriver("PostgreSQL");
con0 <- dbConnect(drv0, dbname='nseindia');
list(idreuters=fetch(dbSendQuery(con0,eval(expr.01)),n=-1)$idreuters);
dbDisconnect(con0)})
Error in postgresqlExecStatement(conn, statement, ...) :
no function to return from, jumping to top level
Error in postgresqlExecStatement(conn, statement, ...) :
no function to return from, jumping to top level
Error in postgresqlExecStatement(conn, statement, ...) :
no function to return from, jumping to top level
Error in { :
task 1 failed - "error in evaluating the argument 'res' in selecting a method for function 'fetch'"
推荐答案
以下的工作和速度比顺序形式提高了大约 1.5 倍.作为下一步,我想知道是否可以将连接对象附加到由 registerDoMC 产生的每个工作人员.如果是这样,则无需创建/销毁连接对象,从而防止连接使 PostgreSQL 服务器不堪重负.
The following works and speeds up by ~ 1.5x over a sequential form. As a next step, I am wondering whether it is possible to attach a connection object to each of the workers spawned by registerDoMC. If so, then there would be no need to create/destroy the connection objects, which prevents from overwhelming the PostgreSQL server with connections.
pgparquery <- function(i) {
drv <- dbDriver("PostgreSQL");
con <- dbConnect(drv, dbname='nsdq');
lst <- eval(expr.01); #contains the SQL query which depends on 'i'
qry <- dbSendQuery(con,lst);
tmp <- fetch(qry,n=-1);
dt <- dates.qed2[i]
dbDisconnect(con);
result <- list(date=dt, idreuters=tmp$idreuters)
return(result)}
id.qed.foreach <- foreach(i = 1588:3638, .inorder=FALSE, .packages=c("DBI", "RPostgreSQL")) %dopar% {pgparquery(i)}
--
维沙尔·贝尔萨雷
--
Vishal Belsare
这篇关于foreach %dopar% + RPostgreSQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!