在同一数据帧中组合两个数据组 [英] Combining two subgroups of data in the same dataframe

查看:141
本文介绍了在同一数据帧中组合两个数据组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框:

  df = data.frame(time = c(2010:2015,2010: 2015),
variable = c(rep(a,6),rep(b,6)),
value = c(rnorm(6),rnorm(6, ))

或:

 时间变量值
1 2010 a -0.5472416
...
6 2015 a -0.2433123
7 2010 b 9.8617777
..
12 2015 b 7.7674609

我需要创建一个新的变量'c = a- b。我发现最好的解决方案是使用软件包'dplyr'和'tidyr':

  df<  -  spread(df ,变量,值)%>%
mutate(c = ba)%>%
gather(variable,value,a:c)%>%
filter c)

其中提供了请求的结果:

 时间变量值
1 2010 c 10.444794
2 2011 c 8.121627
...
6 2015 c 10.589378

有更直接的方法来获得相同的结果,这不需要先扩散,然后到收集数据框?

解决方案

您可以使用 group_by code>总结:

  c<  -  df%>%
group_by(time)%>%
summaryize(value = diff(value))

请注意,这假定 a b 之前数据框。如果您不确定,您可以在 group_by 之前添加安排(变量)



如果一个变量可能有几年不在另一个变量(如您所在的评论),您可以通过添加额外的步骤来摆脱这些情况:

  c<  -  df%>%
group_by(time)%>%
filter(n()== 2)%>%
总结(value = diff(value))


I have a dataframe like this:

df = data.frame(time=c(2010:2015,2010:2015),
                variable=c(rep("a",6),rep("b",6)),
                value=c(rnorm(6),rnorm(6,mean=10)))

or:

   time variable      value 
1  2010        a -0.5472416
...
6  2015        a -0.2433123
7  2010        b  9.8617777
... 
12 2015        b  7.7674609

I need to create a new variable 'c=a-b'. The best solution I've found is to use packages 'dplyr' and 'tidyr':

df <- spread(df,variable,value) %>% 
      mutate(c=b-a) %>% 
      gather(variable,value,a:c) %>%
      filter(variable=="c")

which gives the requested outcome:

  time variable      value
1 2010        c  10.444794
2 2011        c   8.121627
...
6 2015        c  10.589378

Is there a more direct way to obtain the same result, which does not require first to "spread" and then to "gather" the dataframe?

解决方案

You could use group_by and summarize:

c <- df %>%
    group_by(time) %>%
    summarize(value = diff(value))

Note that this assumes the as come before the bs in the data frame. If you're not sure, you can add an arrange(variable) before the group_by.

If one variable could have years that aren't in the other (as in your comment), you could get rid of those cases by adding an extra step:

c <- df %>%
    group_by(time) %>%
    filter(n() == 2) %>%
    summarize(value = diff(value))

这篇关于在同一数据帧中组合两个数据组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆