data.table或dplyr - 数据操作 [英] data.table or dplyr - data manipulation

查看:103
本文介绍了data.table或dplyr - 数据操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据

Date           Col1       Col2
2014-01-01     123        12
2014-01-01     123        21
2014-01-01     124        32
2014-01-01     125        32
2014-01-02     123        34
2014-01-02     126        24
2014-01-02     127        23
2014-01-03     521        21
2014-01-03     123        13
2014-01-03     126        15

现在,我想计算 Col1 日期(在前一天没有重复),并添加到上一个计数。例如,

Now, I want to count unique values in Col1 for the each date (that did not repeat in previous date), and add to the previous count. For example,

Date           Count
2014-01-01       3 i.e. 123,124,125
2014-01-02       5 (2 + above 3) i.e. 126, 127
2014-01-03       6 (1 + above 5) i.e. 521 only


推荐答案

library(dplyr)
df %.% 
  arrange(Date) %.% 
  filter(!duplicated(Col1)) %.% 
  group_by(Date) %.% 
  summarise(Count=n()) %.% # n() <=> length(Date)
  mutate(Count = cumsum(Count))
# Source: local data frame [3 x 2]
# 
#         Date Count
# 1 2014-01-01     3
# 2 2014-01-02     5
# 3 2014-01-03     6

library(data.table)
dt <- data.table(df, key="Date")
dt <- unique(dt, by="Col1")
(dt <- dt[, list(Count=.N), by=Date][, Count:=cumsum(Count)])
#          Date Count
# 1: 2014-01-01     3
# 2: 2014-01-02     5
# 3: 2014-01-03     6

dt <- data.table(df, key="Date")
dt <- unique(dt, by="Col1")
dt[, .N, by=Date][, Count:=cumsum(N)]


$ b b

.N 会自动命名为 N (无点)如果需要,可以在下一个操作中同时使用 .N N

.N is named N (no dot) automatically for convenience in chained operations like this, so you can use both .N and N together in the next operation if need be.

这篇关于data.table或dplyr - 数据操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆