总结在dplyr的条件 [英] Summarize with conditions in dplyr

查看:136
本文介绍了总结在dplyr的条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



样本数据:

 

code> df< - data.frame(ID = c(1,1,2,2,3,5),A = c(foo,bar,foo,foo bar,bar),B = c(1,5,7,23,54,202))

df
ID AB
1 1 foo 1
2 1 bar 5
3 2 foo 7
4 2 foo 23
5 3 bar 54
6 5 bar 202

我想做的是根据ID总结B和B的总和,当A为foo时。我可以在以下几个步骤中执行此操作:

  require(magrittr)
require(dplyr)

df1< - df%>%
group_by(ID)%>%
总结(sumB = sum(B))

df2 < df%>%
过滤器(A ==foo)%>%
group_by(ID)%>%
总汇(sumBfoo = sum(B))

left_join(df1,df2)

ID sumB sumBfoo
1 1 6 1
2 2 30 30
3 3 54 NA
4 5 202 NA

然而,我正在寻找一个更优雅/更快的方式,在sqlite中处理10gb +内存不足的数据。

  require(sqldf)
my_db < src_sqlite(my_db.sqlite3,create = T)
df_sqlite< - copy_to(my_db,df)

我想到使用 mutate 定义一个新的 Bfoo 列:

  df_sqlite%>%
mutate(Bfoo = ifelse(A ==foo,B,0))

Unf幸运的是,这并不适用于数据库的结尾。

  sqliteExecStatement(conn,statement,...)中的错误:
RS-DBI驱动程序:(语句中的错误:无此功能:IFELSE)


解决方案

将@leyley的评论写成答案

  df_sqlite%>%
group_by(ID)%>%
mutate(Bfoo = if(A ==foo)B else 0)%>%
总结(sumB = sum(B),
sumBfoo = sum(Bfoo))%>%
收集


I'll illustrate my question with an example.

Sample data:

 df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B =     c(1, 5, 7, 23, 54, 202))

df
  ID   A   B
1  1 foo   1
2  1 bar   5
3  2 foo   7
4  2 foo  23
5  3 bar  54
6  5 bar 202

What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like:

require(magrittr)
require(dplyr)

df1 <- df %>%
  group_by(ID) %>%
  summarize(sumB = sum(B))

df2 <- df %>%
  filter(A == "foo") %>%
  group_by(ID) %>%
  summarize(sumBfoo = sum(B))

left_join(df1, df2)

  ID sumB sumBfoo
1  1    6       1
2  2   30      30
3  3   54      NA
4  5  202      NA

However, I'm looking for a more elegant/faster way, as I'm dealing with 10gb+ of out-of-memory data in sqlite.

require(sqldf)
my_db <- src_sqlite("my_db.sqlite3", create = T)
df_sqlite <- copy_to(my_db, df)

I thought of using mutate to define a new Bfoo column:

df_sqlite %>%
  mutate(Bfoo = ifelse(A=="foo", B, 0))

Unfortunately, this doesn't work on the database end of things.

Error in sqliteExecStatement(conn, statement, ...) : 
  RS-DBI driver: (error in statement: no such function: IFELSE)

解决方案

Writing up @hadley's comment as an answer

df_sqlite %>%
  group_by(ID) %>%
  mutate(Bfoo = if(A=="foo") B else 0) %>%
  summarize(sumB = sum(B),
            sumBfoo = sum(Bfoo)) %>%
  collect

这篇关于总结在dplyr的条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆