如何从R在PostgreSQL中编写表? [英] How to write a table in PostgreSQL from R?

查看:156
本文介绍了如何从R在PostgreSQL中编写表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前要在PostgreSQL表中插入数据,我必须创建一个空表,然后将插入到表值中... 以及折叠成一个具有所有值的单个字符串。

At present to insert data in a PostgreSQL table I have to create an empty table and then do an insert into table values ... along with a dataframe collapsed insto a single string with all the values. It doesn't work for large sized dataframes.

dbWtriteTable()不适用于PostgreSQL,并给出以下错误...

The dbWtriteTable() doesn't work for PostgreSQL and gives the following error...

Error in postgresqlpqExec(new.con, sql4) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN" LINE 1: COPY "table_1" FROM STDIN



我已尝试回答以下类似问题,尝试过以下破解。这是链接... 如何使用自动递增的主键将数据从R写入PostgreSQL表?

body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
  'postgresqlTableRef(name), "FROM STDIN")', 
  'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")', 
  body_lines,
  fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")

此技巧对我仍然无效。 postgresqlWriteTable()引发完全相同的错误...
这里到底是什么问题?

This hack still doesn't work for me. The postgresqlWriteTable() throws exactly the same error... What exactly is the problem here?

我尝试使用 caroline 软件包中的 dbWriteTable2()。并引发不同的错误...

As an alternative I have tried using dbWriteTable2() from caroline package. And it throws a different error...

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)
creating NAs/NULLs for for fields of table that are missing in your df
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)

还有其他方法可以将大型数据框直接写入PostgreSQL的表中吗?

Is there any other method to write a large dataframe into a table in PostgreSQL directly?

推荐答案

好吧,我不确定 dbWriteTable()为什么会失败;为什么?可能存在某种版本/协议不匹配的情况。也许您可以尝试安装R的最新版本,RPostgreSQL软件包,并在系统上升级PostgreSQL服务器。

Ok, I'm not sure why dbWriteTable() would be failing; there may be some kind of version/protocol mismatch. Perhaps you could try installing the latest versions of R, the RPostgreSQL package, and upgrading the PostgreSQL server on your system, if possible.

关于插入到大数据失败的解决方法中,这在IT世界中通常是必须移动大量数据并且一次性传输不可行/不切实际/不稳定的操作,这有时被称为批处理 批处理 。基本上,您将数据分成较小的块,然后一次发送每个块。

Regarding the insert into workaround failing for large data, what is often done in the IT world when large amounts of data must be moved and a one-shot transfer is infeasible/impractical/flaky is what is sometimes referred to as batching or batch processing. Basically, you divide the data into smaller chunks and send each chunk one at a time.

作为一个随机示例,几年前,我编写了一些Java代码来查询来自HR LDAP服务器的员工信息,一次只能提供1000条记录。因此,基本上我必须编写一个循环以继续发送相同的请求(使用某种基于cookie的怪异机制),然后将记录累积到本地数据库中,直到服务器报告查询完成为止。

As a random example, a few years ago I wrote some Java code to query for employee information from an HR LDAP server which was constrained to only provide 1000 records at a time. So basically I had to write a loop to keep sending the same request (with the query state tracked using some kind of weird cookie-based mechanism) and accumulating the records into a local database until the server reported the query complete.

这是一些代码,这些代码手动构造SQL以基于给定的data.frame创建一个空表,然后使用参数化的批处理大小将data.frame的内容插入到表中。它主要围绕调用 paste()来构建SQL字符串,以及调用 dbSendQuery()来发送实际查询。我还使用 postgresqlDataType()来创建表。

Here's some code that manually constructs the SQL to create an empty table based on a given data.frame, and then insert the content of the data.frame into the table using a parameterized batch size. It's mostly built around calls to paste() to build the SQL strings, and dbSendQuery() to send the actual queries. I also use postgresqlDataType() for the table creation.

## connect to the DB
library('RPostgreSQL'); ## loads DBI automatically
drv <- dbDriver('PostgreSQL');
con <- dbConnect(drv,host=...,port=...,dbname=...,user=...,password=...);

## define helper functions
createEmptyTable <- function(con,tn,df) {
    sql <- paste0("create table \"",tn,"\" (",paste0(collapse=',','"',names(df),'" ',sapply(df[0,],postgresqlDataType)),");");
    dbSendQuery(con,sql);
    invisible();
};

insertBatch <- function(con,tn,df,size=100L) {
    if (nrow(df)==0L) return(invisible());
    cnt <- (nrow(df)-1L)%/%size+1L;
    for (i in seq(0L,len=cnt)) {
        sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote))),");");
        dbSendQuery(con,sql);
    };
    invisible();
};

## generate test data
NC <- 1e2L; NR <- 1e3L; df <- as.data.frame(replicate(NC,runif(NR)));

## run it
tn <- 't1';
dbRemoveTable(con,tn);
createEmptyTable(con,tn,df);
insertBatch(con,tn,df);
res <- dbReadTable(con,tn);
all.equal(df,res);
## [1] TRUE

请注意,我并没有在 row.names 列到数据库表中,与 dbWriteTable()不同,后者似乎总是包含这样的列(并且不会似乎没有提供任何预防措施。)

Note that I didn't bother prepending a row.names column to the database table, unlike dbWriteTable(), which always seems to include such a column (and doesn't seem to provide any means of preventing it).

这篇关于如何从R在PostgreSQL中编写表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆