总结结果总结结果 [英] Dplyr summarise_each to aggregate results

查看:128
本文介绍了总结结果总结结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

    metric1    metric2    metric3 field1 field2
1   1.07809668  4.2569882  7.1710095      L     S1
2   0.56174763  1.2660273 -0.3751915      L     S2
3   1.17447327  5.5186679 11.6868322      L     S2
4   0.32830724 -0.8374830  1.8973718      S     S2
5  -0.51213503 -0.3076640 10.0730274      S     S1
6   0.24133119  2.7984703 15.9622215      S     S1
7   1.96664414  0.1818531  2.7416768      S     S3
8   0.06669409  3.8652075 10.5066330      S     S3
9   1.14660437  8.5703119  3.4294062      L     S4
10 -0.72785683  9.3320762  1.3827989      L     S4

我显示2个字段,但还有几个字段。我需要对每个字段分组的度量进行求和。 for field1:

I am showing 2 fields but have several more. I need to sum the metrics grouped by each field e.g. for field1:

DF %>% group_by(field1) %>% summarise_each(funs(sum),metric1,metric2,metric3)

我可以为列为sum(metric1),sum (metric2),sum(metric3),但是我需要的表输出是这样的:

I can do this for each field where the columns would be sum(metric1), sum(metric2), sum(metric3), but the table output I need is something like this:

L(field1) S(field1) S1(field2)  S2(field2) S3(field2)  S4(field2)
sum(metric1)

sum(metric2)

sum(metric3)

我相信必须有一种方法来使用tidyr和dplyr,但无法弄清楚

I believe there must be a way to do this using tidyr along with dplyr but cannot figure it out

推荐答案

reshape2 recast c $ c> package

Try recast from reshape2 package

library(reshape2)
recast(DF, variable ~ field1 + field2, sum)
#   variable     L_S1      L_S2       L_S4       S_S1       S_S2      S_S3
# 1  metric1 1.078097  1.736221  0.4187475 -0.2708038  0.3283072  2.033338
# 2  metric2 4.256988  6.784695 17.9023881  2.4908063 -0.8374830  4.047061
# 3  metric3 7.171010 11.311641  4.8122051 26.0352489  1.8973718 13.248310

这是相同于

dcast(melt(DF, c("field1", "field2")), variable ~ field1 + field2, sum)

您还可以将其与 tidyr :: gather 如果你想要,但你不能使用 tidyr :: spread ,因为它没有 fun.aggregate 参数

You also can combine it with tidyr::gather if you want, but you can't use tidyr::spread because it doesn't have fun.aggregate argument

DF %>%
  gather(variable, value, -(field1:field2)) %>%
  dcast(variable ~ field1 + field2, sum)
#   variable     L_S1      L_S2       L_S4       S_S1       S_S2      S_S3
# 1  metric1 1.078097  1.736221  0.4187475 -0.2708038  0.3283072  2.033338
# 2  metric2 4.256988  6.784695 17.9023881  2.4908063 -0.8374830  4.047061
# 3  metric3 7.171010 11.311641  4.8122051 26.0352489  1.8973718 13.248310

这篇关于总结结果总结结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆