按组计算连续行中的值之间的差异 [英] Calculate difference between values in consecutive rows by group

查看:54
本文介绍了按组计算连续行中的值之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的df (data.frame):

This is a my df (data.frame):

group value
1     10
1     20
1     25
2     5
2     10
2     15 

我需要按组计算连续行中的值之间的差异.

I need to calculate difference between values in consecutive rows by group.

所以,我需要一个结果.

So, I need a that result.

group value diff
1     10    NA # because there is a no previous value
1     20    10 # value[2] - value[1]
1     25    5  # value[3] value[2]
2     5     NA # because group is changed
2     10    5  # value[5] - value[4]
2     15    5  # value[6] - value[5]

虽然我可以通过使用ddply来处理这个问题,但是太费时间了.这是因为我的 df 中有很多组.(我的 df 中有超过 1,000,000 个组)

Although, I can handle this problem by using ddply, but it takes too much time. This is because I have a lot of groups in my df. (over 1,000,000 groups in my df)

有没有其他有效的方法来处理这个问题?

Are there any other effective approaches to handle this problem?

推荐答案

data.table 可以使用 shift 函数相当快地做到这一点.

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]    
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

<小时>

或者使用dplyr

df %>%
    group_by(group) %>%
    mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

<小时>

对于 pre-data.table::shift 和 pre-dplyr::lag 的替代方案,请参阅编辑.


For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

这篇关于按组计算连续行中的值之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆