当添加的列较少时,追加到现有的SQLite表,而无需将数据库读入R [英] Appending to existing SQLite table when addition has fewer columns, without reading database into R

查看:235
本文介绍了当添加的列较少时,追加到现有的SQLite表,而无需将数据库读入R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SQL端还是R端,是否有一些简单的方法可以将data.frame追加到具有更多列的现有表中?缺少的列应仅用NA填充.如果它能够优雅地处理表2的列多于表1的话,是否有加分?

library(RSQLite)

# Create
db <- dbConnect( SQLite(), dbname="~/temp/test.sqlite" )

# Write test
set.seed(1)
n <- 1000
testDat <- data.frame(key=seq(n), x=runif(n),y=runif(n),g1=sample(letters[1:10],n,replace=TRUE),g2=rep(letters[1:10],each=n/10),g3=factor( sample(letters[1:10],n,replace=TRUE) ))
if(dbExistsTable(db,"test")) dbRemoveTable(db,"test")
dbWriteTable( conn = db, name = "test", value = testDat, row.names=FALSE )
testDat2 <- data.frame( key=seq(n+1,n+100), x=runif(100) )
> dbWriteTable( conn = db, name="test", value = testDat2, row.names=FALSE, append=TRUE  )
[1] FALSE
Warning message:
In value[[3L]](cond) :
  RS-DBI driver: (error in statement: table test has 6 columns but 2 values were supplied)

我也可以为此设想一个包装器.该算法将类似于:

  1. 从现有的SQL表中读取1行.
  2. 从该读取中获取列名.
  3. 使用未包含的名称将列名称添加到data.frame中;填补缺失.
  4. dbWriteTable,现在data.frame与SQL表具有相同的列.

SQLite具有可变类,从而简化了这一过程.但是我宁愿不发明轮子,如果它已经存在.

编辑

仅需澄清一下:这些数据集很大. SQL数据库约为30GB,而data.frame(出于明显的原因实际上是data.table)约为4GB.因此,需要将SQL表读入R的解决方案是非入门级的.

另一种算法是在SQL中进行:

  1. 将R data.frame写入临时SQL表.
  2. 将表附加到主SQL表上的SQL魔术.
  3. 删除临时SQL表.
  4. 利润.

解决方案

一种解决方案是使用dbSendQuery读取一个表(例如1000行)并将其添加到另一表中(添加所需的列).

res <- dbSendQuery(con, "SELECT * from tests")
while(!dbHasCompleted(res)) {
  data <- fetch(res, n = 1000);
  ### Put the data in the other table
}

还有一种方法可以在单个SQLite查询中执行.如果您知道要添加哪些列(用NULL填充),则SQL查询将如下所示:

INSERT INTO target_table SELECT col1,NULL,col2,col3,NULL,NULL,col4 FROM source_table

Is there some simple way, either on the SQL side or on the R side, to append a data.frame to an existing table that has more columns? The missing columns should just be filled in with NAs. Bonus points if it gracefully handles table 2 having more columns than table 1?

library(RSQLite)

# Create
db <- dbConnect( SQLite(), dbname="~/temp/test.sqlite" )

# Write test
set.seed(1)
n <- 1000
testDat <- data.frame(key=seq(n), x=runif(n),y=runif(n),g1=sample(letters[1:10],n,replace=TRUE),g2=rep(letters[1:10],each=n/10),g3=factor( sample(letters[1:10],n,replace=TRUE) ))
if(dbExistsTable(db,"test")) dbRemoveTable(db,"test")
dbWriteTable( conn = db, name = "test", value = testDat, row.names=FALSE )
testDat2 <- data.frame( key=seq(n+1,n+100), x=runif(100) )
> dbWriteTable( conn = db, name="test", value = testDat2, row.names=FALSE, append=TRUE  )
[1] FALSE
Warning message:
In value[[3L]](cond) :
  RS-DBI driver: (error in statement: table test has 6 columns but 2 values were supplied)

I could envision a wrapper for this as well. The algorithm would look something like:

  1. Read 1 row from existing SQL table.
  2. Get column names from that read.
  3. Add column names to data.frame with the non-included names; fill with missings.
  4. dbWriteTable now that the data.frame has the same columns as the SQL table.

That's simplified by the fact that SQLite has mutable class. But I'd rather not reinvent the wheel if it already exists.

EDIT

Just a note to clarify: these datasets are large. The SQL database will be about 30GB, and the data.frame (actually a data.table for obvious reasons) is about 4GB. So solutions that require reading the SQL table into R are non-starters.

An alternative algorithm would be to do it in SQL:

  1. Write R data.frame to a temporary SQL table.
  2. SQL magic to append that table onto the main SQL table.
  3. Delete temporary SQL table.
  4. Profit.

解决方案

One solution would be to read one table in pieces of, say, 1000 rows using dbSendQuery and add them in the other table (adding required columns).

res <- dbSendQuery(con, "SELECT * from tests")
while(!dbHasCompleted(res)) {
  data <- fetch(res, n = 1000);
  ### Put the data in the other table
}

There is also a way to do it in a single SQLite query. If you know which columns to add (fill with NULL) the SQL query would look like this:

INSERT INTO target_table SELECT col1,NULL,col2,col3,NULL,NULL,col4 FROM source_table

这篇关于当添加的列较少时,追加到现有的SQLite表,而无需将数据库读入R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆