如何从R在PostgreSQL中编写表? [英] How to write a table in PostgreSQL from R?
问题描述
目前要在PostgreSQL表中插入数据,我必须创建一个空表,然后将插入到表值中...
以及折叠成一个具有所有值的单个字符串。
At present to insert data in a PostgreSQL table I have to create an empty table and then do an insert into table values ...
along with a dataframe collapsed insto a single string with all the values. It doesn't work for large sized dataframes.
dbWtriteTable()
不适用于PostgreSQL,并给出以下错误...
The dbWtriteTable()
doesn't work for PostgreSQL and gives the following error...
Error in postgresqlpqExec(new.con, sql4) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN" LINE 1: COPY "table_1" FROM STDIN
我已尝试回答以下类似问题,尝试过以下破解。这是链接... 如何使用自动递增的主键将数据从R写入PostgreSQL表?
body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
'postgresqlTableRef(name), "FROM STDIN")',
'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")',
body_lines,
fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")
此技巧对我仍然无效。 postgresqlWriteTable()
引发完全相同的错误...
这里到底是什么问题?
This hack still doesn't work for me. The postgresqlWriteTable()
throws exactly the same error...
What exactly is the problem here?
我尝试使用 caroline
软件包中的 dbWriteTable2()
。并引发不同的错误...
As an alternative I have tried using dbWriteTable2()
from caroline
package. And it throws a different error...
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: column "id" does not exist in table_1
)
creating NAs/NULLs for for fields of table that are missing in your df
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: column "id" does not exist in table_1
)
还有其他方法可以将大型数据框直接写入PostgreSQL的表中吗?
Is there any other method to write a large dataframe into a table in PostgreSQL directly?
推荐答案
好吧,我不确定 dbWriteTable()
为什么会失败;为什么?可能存在某种版本/协议不匹配的情况。也许您可以尝试安装R的最新版本,RPostgreSQL软件包,并在系统上升级PostgreSQL服务器。
Ok, I'm not sure why dbWriteTable()
would be failing; there may be some kind of version/protocol mismatch. Perhaps you could try installing the latest versions of R, the RPostgreSQL package, and upgrading the PostgreSQL server on your system, if possible.
关于插入到大数据失败的解决方法中,这在IT世界中通常是必须移动大量数据并且一次性传输不可行/不切实际/不稳定的操作,这有时被称为批处理或 批处理 。基本上,您将数据分成较小的块,然后一次发送每个块。
Regarding the insert into
workaround failing for large data, what is often done in the IT world when large amounts of data must be moved and a one-shot transfer is infeasible/impractical/flaky is what is sometimes referred to as batching or batch processing. Basically, you divide the data into smaller chunks and send each chunk one at a time.
作为一个随机示例,几年前,我编写了一些Java代码来查询来自HR LDAP服务器的员工信息,一次只能提供1000条记录。因此,基本上我必须编写一个循环以继续发送相同的请求(使用某种基于cookie的怪异机制),然后将记录累积到本地数据库中,直到服务器报告查询完成为止。
As a random example, a few years ago I wrote some Java code to query for employee information from an HR LDAP server which was constrained to only provide 1000 records at a time. So basically I had to write a loop to keep sending the same request (with the query state tracked using some kind of weird cookie-based mechanism) and accumulating the records into a local database until the server reported the query complete.
这是一些代码,这些代码手动构造SQL以基于给定的data.frame创建一个空表,然后使用参数化的批处理大小将data.frame的内容插入到表中。它主要围绕调用 paste()
来构建SQL字符串,以及调用 dbSendQuery()
来发送实际查询。我还使用 postgresqlDataType()
来创建表。
Here's some code that manually constructs the SQL to create an empty table based on a given data.frame, and then insert the content of the data.frame into the table using a parameterized batch size. It's mostly built around calls to paste()
to build the SQL strings, and dbSendQuery()
to send the actual queries. I also use postgresqlDataType()
for the table creation.
## connect to the DB
library('RPostgreSQL'); ## loads DBI automatically
drv <- dbDriver('PostgreSQL');
con <- dbConnect(drv,host=...,port=...,dbname=...,user=...,password=...);
## define helper functions
createEmptyTable <- function(con,tn,df) {
sql <- paste0("create table \"",tn,"\" (",paste0(collapse=',','"',names(df),'" ',sapply(df[0,],postgresqlDataType)),");");
dbSendQuery(con,sql);
invisible();
};
insertBatch <- function(con,tn,df,size=100L) {
if (nrow(df)==0L) return(invisible());
cnt <- (nrow(df)-1L)%/%size+1L;
for (i in seq(0L,len=cnt)) {
sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote))),");");
dbSendQuery(con,sql);
};
invisible();
};
## generate test data
NC <- 1e2L; NR <- 1e3L; df <- as.data.frame(replicate(NC,runif(NR)));
## run it
tn <- 't1';
dbRemoveTable(con,tn);
createEmptyTable(con,tn,df);
insertBatch(con,tn,df);
res <- dbReadTable(con,tn);
all.equal(df,res);
## [1] TRUE
请注意,我并没有在 row.names
列到数据库表中,与 dbWriteTable()
不同,后者似乎总是包含这样的列(并且不会似乎没有提供任何预防措施。)
Note that I didn't bother prepending a row.names
column to the database table, unlike dbWriteTable()
, which always seems to include such a column (and doesn't seem to provide any means of preventing it).
这篇关于如何从R在PostgreSQL中编写表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!