在同一数据帧中组合两个数据组 [英] Combining two subgroups of data in the same dataframe
问题描述
df = data.frame(time = c(2010:2015,2010: 2015),
variable = c(rep(a,6),rep(b,6)),
value = c(rnorm(6),rnorm(6, ))
或:
时间变量值
1 2010 a -0.5472416
...
6 2015 a -0.2433123
7 2010 b 9.8617777
..
12 2015 b 7.7674609
我需要创建一个新的变量'c = a- b。我发现最好的解决方案是使用软件包'dplyr'和'tidyr':
df< - spread(df ,变量,值)%>%
mutate(c = ba)%>%
gather(variable,value,a:c)%>%
filter c)
其中提供了请求的结果:
时间变量值
1 2010 c 10.444794
2 2011 c 8.121627
...
6 2015 c 10.589378
有更直接的方法来获得相同的结果,这不需要先扩散,然后到收集数据框?
您可以使用 group_by
code>总结:
c< - df%>%
group_by(time)%>%
summaryize(value = diff(value))
请注意,这假定 a
在 b
之前数据框。如果您不确定,您可以在 group_by
之前添加安排(变量)
。
如果一个变量可能有几年不在另一个变量(如您所在的评论),您可以通过添加额外的步骤来摆脱这些情况:
c< - df%>%
group_by(time)%>%
filter(n()== 2)%>%
总结(value = diff(value))
I have a dataframe like this:
df = data.frame(time=c(2010:2015,2010:2015),
variable=c(rep("a",6),rep("b",6)),
value=c(rnorm(6),rnorm(6,mean=10)))
or:
time variable value
1 2010 a -0.5472416
...
6 2015 a -0.2433123
7 2010 b 9.8617777
...
12 2015 b 7.7674609
I need to create a new variable 'c=a-b'. The best solution I've found is to use packages 'dplyr' and 'tidyr':
df <- spread(df,variable,value) %>%
mutate(c=b-a) %>%
gather(variable,value,a:c) %>%
filter(variable=="c")
which gives the requested outcome:
time variable value
1 2010 c 10.444794
2 2011 c 8.121627
...
6 2015 c 10.589378
Is there a more direct way to obtain the same result, which does not require first to "spread" and then to "gather" the dataframe?
You could use group_by
and summarize
:
c <- df %>%
group_by(time) %>%
summarize(value = diff(value))
Note that this assumes the a
s come before the b
s in the data frame. If you're not sure, you can add an arrange(variable)
before the group_by
.
If one variable could have years that aren't in the other (as in your comment), you could get rid of those cases by adding an extra step:
c <- df %>%
group_by(time) %>%
filter(n() == 2) %>%
summarize(value = diff(value))
这篇关于在同一数据帧中组合两个数据组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!