dplyr :: group_by()之后的一个组中的diff操作 [英] diff operation within a group, after a dplyr::group_by()
问题描述
假设我有这个data.frame(带有3个变量)
Let's say I have this data.frame (with 3 variables)
ID Period Score
123 2013 146
123 2014 133
23 2013 150
456 2013 205
456 2014 219
456 2015 140
78 2012 192
78 2013 199
78 2014 133
78 2015 170
使用dplyr我可以将它们按ID并过滤出现不止一次的这些ID
Using dplyr I can group them by ID and filter these ID that appear more than once
data <- data %>% group_by(ID) %>% filter(n() > 1)
现在,我想要实现的是添加一列即:
差额=期间P的得分 - 期间得分P-1
得到像这样的东西:
Now, what I like to achieve is to add a column that is: Difference = Score of Period P - Score of Period P-1 to get something like this:
ID Period Score Difference
123 2013 146
123 2014 133 -13
456 2013 205
456 2014 219 14
456 2015 140 -79
78 2012 192
78 2013 199 7
78 2014 133 -66
78 2015 170 37
在电子表格中执行此操作非常简单,但我不知道如何在R中实现此功能。
感谢您提供任何帮助或指导。
It is rather trivial to do this in a spreadsheet, but I have no idea on how I can achieve this in R.
Thanks for any help or guidance.
推荐答案
这是另一个使用 lag
的解决方案。根据用例,它可能比 diff
更方便,因为 NAs
清楚地表明特定的值没有前者,而使用 diff
的 0
可能是a)缺少前驱或b)两者之间相减的结果
Here is another solution using lag
. Depending on the use case it might be more convenient than diff
because the NAs
clearly show that a particular value did not have predecessor whereas a 0
using diff
might be the result of a) a missing predecessor or of b) the subtraction between two periods.
data %>% group_by(ID) %>% filter(n() > 1) %>%
mutate(
Difference = Score - lag(Score)
)
# ID Period Score Difference
# 1 123 2013 146 NA
# 2 123 2014 133 -13
# 3 456 2013 205 NA
# 4 456 2014 219 14
# 5 456 2015 140 -79
# 6 78 2012 192 NA
# 7 78 2013 199 7
# 8 78 2014 133 -66
# 9 78 2015 170 37
这篇关于dplyr :: group_by()之后的一个组中的diff操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!