如何使用dplyr查找R数据帧中两行中的值之间的差异 [英] How to find difference between values in two rows in an R dataframe using dplyr

查看:19
本文介绍了如何使用dplyr查找R数据帧中两行中的值之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 R 数据框,例如:

I have an R dataframe such as:

df <- data.frame(period=rep(1:4,2), 
                 farm=c(rep('A',4),rep('B',4)), 
                 cumVol=c(1,5,15,31,10,12,16,24),
                 other = 1:8);

  period farm cumVol other
1      1    A      1     1
2      2    A      5     2
3      3    A     15     3
4      4    A     31     4
5      1    B     10     5
6      2    B     12     6
7      3    B     16     7
8      4    B     24     8

如何找到每个时期每个农场的 cumVol 变化,而忽略其他"列?我想要这样的数据框(可以选择保留 cumVol 列):

How do I find the change in cumVol at each farm in each period, ignoring the 'other' column? I would like a dataframe like this (optionally with the cumVol column remaining):

  period farm volume other
1      1    A      0     1
2      2    A      4     2
3      3    A     10     3
4      4    A     16     4
5      1    B      0     5
6      2    B      2     6
7      3    B      4     7
8      4    B      8     8

在实践中可能有许多类似农场"的列,以及许多类似其他"(即被忽略)的列.我希望能够使用变量指定所有列名.

In practice there may be many 'farm'-like columns, and many 'other'-like (ie. ignored) columns. I'd like to be able to specify all the column names using variables.

我正在使用 dplyr 包.

I am using the dplyr package.

推荐答案

在 dplyr:

require(dplyr)
df %>%
  group_by(farm) %>%
  mutate(volume = cumVol - lag(cumVol, default = cumVol[1]))

Source: local data frame [8 x 5]
Groups: farm

  period farm cumVol other volume
1      1    A      1     1      0
2      2    A      5     2      4
3      3    A     15     3     10
4      4    A     31     4     16
5      1    B     10     5      0
6      2    B     12     6      2
7      3    B     16     7      4
8      4    B     24     8      8

也许期望的输出实际上应该如下所示?

Perhaps the desired output should actually be as follows?

df %>%
  group_by(farm) %>%
  mutate(volume = cumVol - lag(cumVol, default = 0))

  period farm cumVol other volume
1      1    A      1     1      1
2      2    A      5     2      4
3      3    A     15     3     10
4      4    A     31     4     16
5      1    B     10     5     10
6      2    B     12     6      2
7      3    B     16     7      4
8      4    B     24     8      8

跟进您的评论,我认为您正在寻找安排().事实并非如此,最好开始一个新问题.

Following up on your comments I think you are looking for arrange(). It that is not the case it might be best to start a new question.

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df1 %>% 
  arrange(desc(period), desc(farm)) %>%
  group_by(period, farm) %>% 
  summarise(cumVol=sum(cumCropVol))

跟进#2

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df <- df1 %>% 
  arrange(desc(period), desc(farm)) %>% 
  group_by(period, farm) %>% 
  summarise(cumVol=sum(cumCropVol))

ungroup(df) %>% 
  arrange(farm) %>%
  group_by(farm) %>% 
  mutate(volume = cumVol - lag(cumVol, default = 0))

Source: local data frame [8 x 4]
Groups: farm

  period farm cumVol volume
1      1    A     12     12
2      2    A     20      8
3      3    A     40     20
4      4    A     62     22
5      1    B     30     30
6      2    B     34      4
7      3    B     42      8
8      4    B     58     16

这篇关于如何使用dplyr查找R数据帧中两行中的值之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆