在dplyr中用转换后的数据替换SQL数据库表 [英] Replacing SQL database table with transformed data in dplyr

查看:57
本文介绍了在dplyr中用转换后的数据替换SQL数据库表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在dplyr中转换数据时,我经常替换数据,尤其是在处理大型数据集时.当我使用SQL支持的数据集时,至少在使用SQLite时,我不确定如何优雅地做到这一点.

I frequently replace data as I transform it in dplyr, especially when working with large datasets. I'm not sure how to do this elegantly when I am working with SQL-backed datasets, at least not with SQLite.

在dplyr数据库小插曲中或在SO上找不到关于此目标的任何讨论,这也让我想知道我最初所做的事情是否有问题.但是,这似乎是处理大型数据集的自然方法.

I could not find any discussion of this goal in the dplyr DB vignettes or on SO, which also makes me wonder if there's something wrong with what I'm doing in the first place; however, it seems to be a natural way to work with large datasets.

无论如何,最直观的方法行不通:

At any rate, the most intuitive approach does not work:

library(dplyr)
library(RSQLite)

db2 <- src_sqlite("trouble.sqlite", create = TRUE)
trouble <- data.frame(values = c(5, 1, 3))
trouble.db <- copy_to(db2, trouble, temporary = FALSE)

collect(trouble.db) # 5, 3, 1

trouble.db <- trouble.db %>% arrange(values)
collect(trouble.db) # 1, 3, 5

trouble.in <- tbl(db2, sql("SELECT * from trouble"))
collect(trouble.in) # 5, 3, 1

就地复制的另一种直观语法给出表已存在"错误:

Another intuitive syntax for in-place copy gives a "table already exists" error:

trouble.db <- copy_to(db2, as.data.frame(trouble.db), name="trouble", temporary = FALSE)

一种解决方案是手动删除表并重建它,这就是我一直在做的事情:

One solution is to manually drop the table and rebuild it, which is what I've been doing:

db2$con %>% db_drop_table(table = "trouble")
trouble <- collect(trouble.db)
trouble.db <- copy_to(db2, trouble, temporary = FALSE)

另一种方法是放弃替换并创建一系列临时表,我觉得这些表不美观,但我认为这可能是推荐的范例:

Another one is to give up on replacement and create a series of temporary tables, which I find unaesthetic but which I suppose might be the recommended paradigm:

trouble_temp <- data.frame(values = c(5, 1, 3))
trouble_temp.db <- copy_to(db2, trouble_temp, temporary = TRUE)
trouble <- trouble.db %>% arrange(values)
trouble.db <- copy_to(db2, trouble, temporary = FALSE)

我怀疑拖放并复制"将成为答案,但是出于对精美解决方案的热爱,我想问一下是否有更好的方法.

I suspect that "drop and re-copy" is going to wind up being the answer, but out of an abundance of love for beautiful solutions, I thought I would ask if there is a better way.

推荐答案

此年之后找到的任何人.

For anyone finding this years after.

声明

trouble.db %>% arrange(values)

创建一个SQL查询,该查询将发送到数据库并在您收集结果时执行.

Creates an SQL query that's sent to the database and executed when you collect the result.

我们可以看到这样的SQL

We can see that SQL like this

trouble.db %>% arrange(values) %>% show_query()

SELECT *
FROM `trouble`
ORDER BY `values`

很显然,这样的查询无法修改其查询的实际数据.

Clearly a query like this can't modify the actual data it's querying.

要修改数据,我们可以使用 DBI 包中的 dbWriteTable 函数

To modify the data, we can use dbWriteTable function from the DBI package

library(dplyr)
library(RSQLite)

# Will use this connection object for all our DB interactions
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")

trouble <- data.frame(values = c(5, 1, 3))
trouble.db <- copy_to(con, trouble, temporary = FALSE)

collect(trouble.db) # 5, 3, 1

# This is just a query
trouble.db <- trouble.db %>% arrange(values)
collect(trouble.db) # 1, 3, 5

# The data shouldn't be modified yet
trouble.in <- tbl(con, sql("SELECT * from trouble"))
collect(trouble.in) # 5, 3, 1

# Now we are modifying the data
DBI::dbWriteTable(
  con, "trouble", collect(trouble.db),
  overwrite = TRUE
)

tbl(con, sql("SELECT * from trouble")) %>% collect() # 1, 3, 5

这篇关于在dplyr中用转换后的数据替换SQL数据库表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆