Dplyr:筛选系列中日期的最后一个条目 [英] Dplyr: filter last entry for date in a series
问题描述
对于给定月份超过一个的每次出现,我只想过滤时间序列中的最后一个日期条目.
I want to filter only the last date entry in a time series for every occurrence where there was more than one for a given month.
作为这样一个表中的示例:
As an example in a table like this:
obs <- c("A", "B", "A", "B", "A", "B", "A", "B")
date <- c("2017-01-01", "2017-01-01", "2017-02-01", "2017-02-01", "2017-03-01", "2017-03-01", "2017-03-02","2017-03-02")
num <- c(1000, 1800, 2000, 2900, 3000, 3400, 3500, 3400)
dat <- data.frame(obs, date, num)
obs date num
1 A 2017-01-01 1000
2 B 2017-01-01 1800
3 A 2017-02-01 2000
4 B 2017-02-01 2900
5 A 2017-03-01 3000
6 B 2017-03-01 3400
7 A 2017-03-02 3500
8 B 2017-03-02 3400
一个简单的"A"选择将是:
A simple selection for "A" would be:
x <- dat %>%
filter(obs=="A") %>%
select(obs, date, num) %>%
mutate(date = ymd(date))
obs date num
1 A 2017-01-01 1000
2 A 2017-02-01 2000
3 A 2017-03-01 3000
4 A 2017-03-02 3500
因此,现在第三个月有两个条目,我只想保留该月的最新条目.我认为这样做是一件很简单的事情:
So, there are now two entries for the third month and I would like to keep only the most recent entry for that month. I thought it would be a straightforward thing to do so I did:
x <- dat %>%
filter(obs=="A") %>%
select(obs, date, num) %>%
mutate(date = ymd(date)) %>%
arrange(date) %>%
slice(which.max(date))
但是我只得到最后一个条目,而删除了其他条目.我想念什么?输出应为:
But I get just the last entry instead with the other ones removed. What am I missing? The output should be:
obs date num
1 A 2017-01-01 1000
2 A 2017-02-01 2000
4 A 2017-03-02 3500
推荐答案
您需要按 month(date)
分组,然后过滤最后一个日期:
You need to group by month(date)
and then filter for the last date:
dat %>% filter(obs=="A") %>%
mutate(date = ymd(date)) %>%
group_by(obs, m = month(date)) %>%
filter(date == max(date))
# obs date num m
#1 A 2017-01-01 1000 1
#2 A 2017-02-01 2000 2
#3 A 2017-03-02 3500 3
这篇关于Dplyr:筛选系列中日期的最后一个条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!