按组计算连续行中的值之间的差异 [英] Calculate difference between values in consecutive rows by group
问题描述
这是我的df
(data.frame):
This is a my df
(data.frame):
group value
1 10
1 20
1 25
2 5
2 10
2 15
我需要按组计算连续行中的值之间的差异.
I need to calculate difference between values in consecutive rows by group.
所以,我需要一个结果.
So, I need a that result.
group value diff
1 10 NA # because there is a no previous value
1 20 10 # value[2] - value[1]
1 25 5 # value[3] value[2]
2 5 NA # because group is changed
2 10 5 # value[5] - value[4]
2 15 5 # value[6] - value[5]
虽然我可以通过使用ddply
来处理这个问题,但是太费时间了.这是因为我的 df
中有很多组.(我的 df
中有超过 1,000,000 个组)
Although, I can handle this problem by using ddply
, but it takes too much time. This is because I have a lot of groups in my df
. (over 1,000,000 groups in my df
)
有没有其他有效的方法来处理这个问题?
Are there any other effective approaches to handle this problem?
推荐答案
包 data.table
可以使用 shift
函数相当快地做到这一点.
The package data.table
can do this fairly quickly, using the shift
function.
require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame
df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax
<小时>
或者使用dplyr
df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5
<小时>
对于 pre-data.table::shift
和 pre-dplyr::lag
的替代方案,请参阅编辑.
For alternatives pre-data.table::shift
and pre-dplyr::lag
, see edits.
这篇关于按组计算连续行中的值之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!