有没有一种方法可以使用dplyr :: bind_rows而不从数据库中收集数据帧? [英] Is there a way to use dplyr::bind_rows without collecting data frames from the database?
问题描述
是否有一种方法可以在不首先从数据库收集数据集的情况下对一组数据帧使用bind_rows()
?
Is there a way to use bind_rows()
on a set of data frames without first collecting them from the database?
说我已经定义了几个dplyr查询表:
Say I've defined a couple dplyr query tables:
mydatabase <- src_mysql('database')
table1 <- tbl(mydatabase,"table1")
table2 <- tbl(mydatabase,"table3")
foo <- table1 %>% filter(id > 10) %>% select(id)
bar <- table2 %>% select(id)
我希望能够将foo和bar连接在一起-本质上,我想对两个子查询执行联合,而不必使用SQL.但是,当我尝试这样做时,出现错误,因为我试图连接两个tbl_sql对象,而不是真实的数据帧:
I'd like to be able to join foo and bar together--in essence, I'd like to perform a union on the two subqueries without having to drop to SQL. However, when I try that, I get an error because I'm trying to join two tbl_sql objects, rather that real data frames:
unioned_data_frame <- bind_rows(foo,bar)
错误:大小不兼容(1!= 8)
Error: incompatible sizes (1 != 8)
有什么建议吗?在这个玩具示例中,用SQL编写整个查询不是问题,但是,当然,在现实生活中,foo和bar通常要复杂得多.
Any suggestions? In this toy example, writing the whole query in SQL wouldn't be a problem, but of course, in real life, foo and bar are often significantly more complicated.
推荐答案
使用dplyr::union()
将执行SQL union()
操作,尽管要注意的是,dplyr::union()
将删除重复的行(如SQL版本) .使用dplyr::union_all()
可以保留重复的行,例如bind_rows()
.
Using dplyr::union()
will do the SQL union()
action, although it's important to note that that dplyr::union()
will remove duplicate rows (like the SQL version). Using dplyr::union_all()
keeps duplicate rows like bind_rows()
.
不幸的是,没有办法获得bind_rows()
的好处,尤其是非常有用的.id
参数.
Unfortunately, there isn't a way to get benefits of bind_rows()
, particularly the very useful .id
argument.
这篇关于有没有一种方法可以使用dplyr :: bind_rows而不从数据库中收集数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!