使用dplyr进行汇总,但保留组行的日期 [英] Use dplyr to summarize but preserve date of group row
问题描述
我有一个如下数据框:
Date Flare Painmed_Use
1 2015-12-01 0 0
2 2015-12-02 0 0
3 2015-12-03 0 0
4 2015-12-04 0 0
5 2015-12-05 0 0
6 2015-12-06 0 1
7 2015-12-07 1 4
8 2015-12-08 1 3
9 2015-12-09 1 1
10 2015-12-10 1 0
11 2015-12-11 0 0
12 2015-12-12 0 0
13 2015-12-13 1 2
14 2015-12-14 1 3
15 2015-12-15 1 1
16 2015-12-16 0 0
$我正在尝试使用dplyr来查找每次爆发的长度以及每次爆发期间的总用药量。我当前的解决方案(灵感来自使用rle对跑步进行分组使用dplyr )时,
I'm trying to find the length of each flare as well as the total med use during each flare using dplyr. My current solution (inspired by Use rle to group by runs when using dplyr),
df %>%
group_by(yy = {yy = rle(Flare); rep(seq_along(yy$lengths), yy$lengths)}, Flare) %>%
summarize(Painmed_UseCum = sum(Painmed_Use),FlareLength = n())
提供以下输出:
yy Flare Painmed_UseCum FlareLength
<int> <int> <dbl> <int>
1 1 0 1 6
2 2 1 8 4
3 3 0 0 2
4 4 1 6 3
5 5 0 0 1
这几乎正是我所需要的。但是,我无法弄清楚如何保留其他列,关键的是与特定耀斑的最后一行相对应的日期。因此,我要查找的输出与上面相同,但是添加了Date,例如:
This is almost exactly what I need. However, I can't figure out how to preserve other columns, the critical one being the date that corresponds to the last row of a particular flare. So, the output I'm seeking is the same as above but with the addition of the Dates, like so:
Date yy Flare Painmed_UseCum FlareLength
<int> <int> <dbl> <int>
1 2015-12-06 1 0 1 6
2 2015-12-10 2 1 8 4
3 2015-12-12 3 0 0 2
4 2015-12-15 4 1 6 3
5 2015-12-16 5 0 0 1
注意:在某些方面,这是我先前的问题( R代码以按组获取时间序列数据的最大计数),但是我试图简化该问题(尽管可能对其他人有用)的尝试最终导致了这个进一步的问题。
Note: In some ways this is a follow up from a previous question of mine (R code to get max count of time series data by group) but my attempt to keep that question simpler, though perhaps useful to others, ended up necessitating this further question.
推荐答案
您可以在摘要
library(dplyr)
df %>%
group_by(yy = {yy = rle(Flare); rep(seq_along(yy$lengths),yy$lengths)}) %>%
summarize(Painmed_UseCum = sum(Painmed_Use),FlareLength = n(), Date = max(Date))
# Groups: yy, Flare [5]
# Date Flare Painmed_Use yy
# <date> <int> <int> <int>
#1 2015-12-06 0 1 1
#2 2015-12-10 1 0 2
#3 2015-12-12 0 0 3
#4 2015-12-15 1 1 4
#5 2015-12-16 0 0 5
或如果要保留更多列,则更好的方法是使用 mutate
并选择每个组中的最后一行。
Or if there are more columns to preserve better approach is to use mutate
and select the last row in each group.
df %>%
group_by(yy = {yy = rle(Flare); rep(seq_along(yy$lengths), yy$lengths)}) %>%
mutate(Painmed_UseCum = sum(Painmed_Use),FlareLength = n()) %>%
slice(n())
要创建组,我们可以替换 rle
与 data.table
中的 rleid
比较简单。
group_by(yy = data.table::rleid(Flare))
这篇关于使用dplyr进行汇总,但保留组行的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!