在同一数据帧中组合两个数据组 [英] Combining two subgroups of data in the same dataframe

查看：141 发布时间：2017/7/13 22:09:54 r dplyr

本文介绍了在同一数据帧中组合两个数据组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我有一个这样的数据框：

  df = data.frame（time = c（2010：2015,2010： 2015），
 variable = c（rep（a，6），rep（b，6）），
 value = c（rnorm（6），rnorm（6， ））

或：

 时间变量值
 1 2010 a -0.5472416 
 ... 
 6 2015 a -0.2433123 
 7 2010 b 9.8617777 
 .. 
 12 2015 b 7.7674609

我需要创建一个新的变量'c = a- b。我发现最好的解决方案是使用软件包'dplyr'和'tidyr'：

  df<  -  spread（df ，变量，值）％>％
 mutate（c = ba）％>％
 gather（variable，value，a：c）％>％
 filter c）

其中提供了请求的结果：

 时间变量值
 1 2010 c 10.444794 
 2 2011 c 8.121627 
 ... 
 6 2015 c 10.589378

有更直接的方法来获得相同的结果，这不需要先扩散，然后到收集数据框？

解决方案

您可以使用 group_by code>总结：

  c<  -  df％>％
 group_by（time）％>％
 summaryize（value = diff（value））

请注意，这假定 a 在 b 之前数据框。如果您不确定，您可以在 group_by 之前添加安排（变量）。

如果一个变量可能有几年不在另一个变量（如您所在的评论），您可以通过添加额外的步骤来摆脱这些情况：

  c<  -  df％>％
 group_by（time）％>％
 filter（n（）== 2）％>％
总结（value = diff（value））

I have a dataframe like this:

df = data.frame(time=c(2010:2015,2010:2015),
                variable=c(rep("a",6),rep("b",6)),
                value=c(rnorm(6),rnorm(6,mean=10)))

or:

   time variable      value 
1  2010        a -0.5472416
...
6  2015        a -0.2433123
7  2010        b  9.8617777
... 
12 2015        b  7.7674609

I need to create a new variable 'c=a-b'. The best solution I've found is to use packages 'dplyr' and 'tidyr':

df <- spread(df,variable,value) %>% 
      mutate(c=b-a) %>% 
      gather(variable,value,a:c) %>%
      filter(variable=="c")

which gives the requested outcome:

  time variable      value
1 2010        c  10.444794
2 2011        c   8.121627
...
6 2015        c  10.589378

Is there a more direct way to obtain the same result, which does not require first to "spread" and then to "gather" the dataframe?

解决方案

You could use group_by and summarize:

c <- df %>%
    group_by(time) %>%
    summarize(value = diff(value))

Note that this assumes the as come before the bs in the data frame. If you're not sure, you can add an arrange(variable) before the group_by.

If one variable could have years that aren't in the other (as in your comment), you could get rid of those cases by adding an extra step:

c <- df %>%
    group_by(time) %>%
    filter(n() == 2) %>%
    summarize(value = diff(value))

这篇关于在同一数据帧中组合两个数据组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文