使用dplyr添加汇总行 [英] Using dplyr to add summary rows

查看:149
本文介绍了使用dplyr添加汇总行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

       Gender Year.10 Year.11 Year.12 Year.13 Year.10.1 Year.11.1 Year.12.1 Year.13.1
1      FEMALE 1181980 1113480 1040960 1033150   1116220   1059850   1022950    974490
2        MALE  674020  783150  571170  594330    641620    767590    554290    563670
3 UNSPECIFIED   31930    7740   14670   17420     31930      5590      9170     17420

我已经总结了我的数据,并希望在我的data.frame中将这些金额添加到他们各自的性别下面。所以,例如在我的女性行后,我想添加我的女性金额在它下面(与年龄==所有)
我认为我可以使用dplyr完成这个,但我不知道如何选择职位数据框架。

I have summed up my data and want to add these sums to underneath their respective Genders in my data.frame. So for instance after my female rows, I want to add my female sums underneath it (with Age == "All") I think I can accomplish this using dplyr, but I don't know how to select positions in the data.frame.

我认为这必须是以下一些:

I think it would have to be something along the lines of:

dummy.c %>%
group_by(Gender) %>%
rowwise () %>%
mutate () #### It's here where I don't know where to go on from 

我用 summarise_each(funs(sum = sum (。,na.rm = TRUE)),starts_with(Year))生成我的摘要,我可能要把它放在我的mutate()中。

I used summarise_each(funs(sum=sum(., na.rm=TRUE)), starts_with("Year")) to generate my summary, I probably have to put that in my mutate ().

> dput(dummy.c[sample(1:nrow(dummy.c), 15, replace=FALSE),])
structure(list(Gender = structure(c(1L, 1L, 2L, 2L, 3L, 2L, 2L, 
2L, 3L, 3L, 1L, 2L, 2L, 3L, 3L), .Label = c("FEMALE", "MALE", 
"UNSPECIFIED"), class = "factor"), Age = structure(c(5L, 3L, 
3L, 4L, 7L, 6L, 5L, 8L, 6L, 8L, 2L, 7L, 1L, 5L, 2L), .Label = c("0-2", 
"3-9", "10-19", "20-39", "40-59", "60-64", "65+", "UNSP"), class = "factor"), 
Year.10 = c(484770L, 58440L, 58570L, 200290L, NA, 54780L, 
238600L, NA, 2470L, 10920L, 46890L, 63360L, 16900L, 12850L, 
NA), Year.11 = c(439860L, 92870L, 60060L, 264280L, NA, 54400L, 
258820L, NA, NA, NA, 30150L, 84750L, 22380L, NA, 2150L), 
Year.12 = c(454200L, 50900L, 55600L, 230460L, 3610L, 47960L, 
148530L, NA, NA, 5500L, 18020L, 64810L, 2260L, 5560L, NA), 
Year.13 = c(412650L, 84110L, 38000L, 205600L, NA, 40600L, 
185770L, 5670L, NA, 5700L, 19060L, 79150L, 5860L, NA, NA), 
Gender.1 = structure(c(1L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 
3L, 1L, 2L, 2L, 3L, 3L), .Label = c("FEMALE", "MALE", "UNSPECIFIED"
), class = "factor"), Age.1 = structure(c(5L, 3L, 3L, 4L, 
7L, 6L, 5L, 8L, 6L, 8L, 2L, 7L, 1L, 5L, 2L), .Label = c("0-2", 
"3-9", "10-19", "20-39", "40-59", "60-64", "65+", "UNSP"), class = c("ordered", 
"factor")), Year.10.1 = c(460340L, 52680L, 58570L, 197110L, 
NA, 54780L, 226110L, NA, 2470L, 10920L, 41370L, 52240L, 16900L, 
12850L, NA), Year.11.1 = c(417800L, 87280L, 60060L, 264280L, 
NA, 54400L, 248810L, NA, NA, NA, 30150L, 79200L, 22380L, 
NA, NA), Year.12.1 = c(447400L, 50900L, 50100L, 224700L, 
3610L, 47960L, 148530L, NA, NA, NA, 18020L, 59190L, 2260L, 
5560L, NA), Year.13.1 = c(395700L, 78440L, 36520L, 200050L, 
NA, 37360L, 185770L, 5670L, NA, 5700L, 19060L, 62130L, 5860L, 
NA, NA)), .Names = c("Gender", "Age", "Year.10", "Year.11", 
"Year.12", "Year.13", "Gender.1", "Age.1", "Year.10.1", "Year.11.1", 
"Year.12.1", "Year.13.1"), row.names = c(5L, 3L, 11L, 12L, 22L, 
14L, 13L, 16L, 21L, 23L, 2L, 15L, 9L, 20L, 17L), class = "data.frame")

编辑1:显示数据

>头(dummy.c)

  Gender   Age Year.10 Year.11 Year.12 Year.13 Gender.1 Age.1 Year.10.1 Year.11.1 Year.12.1 Year.13.1
1 FEMALE   0-2   13700    2470    7820    2100   FEMALE   0-2     13700      2470      7820      2100
2 FEMALE   3-9   46890   30150   18020   19060   FEMALE   3-9     41370     30150     18020     19060
3 FEMALE 10-19   58440   92870   50900   84110   FEMALE 10-19     52680     87280     50900     78440
4 FEMALE 20-39  380610  387080  291930  371290   FEMALE 20-39    356070    372370    280720    356500
5 FEMALE 40-59  484770  439860  454200  412650   FEMALE 40-59    460340    417800    447400    395700
6 FEMALE 60-64   80090   76670   92750   60710   FEMALE 60-64     80090     76670     92750     49240`

tail(dummy.c)

        Gender   Age Year.10 Year.11 Year.12 Year.13    Gender.1 Age.1 Year.10.1 Year.11.1 Year.12.1
18 UNSPECIFIED 10-19    5690      NA      NA      NA UNSPECIFIED 10-19      5690        NA        NA
19 UNSPECIFIED 20-39      NA    5590      NA   11720 UNSPECIFIED 20-39        NA      5590        NA
20 UNSPECIFIED 40-59   12850      NA    5560      NA UNSPECIFIED 40-59     12850        NA      5560
21 UNSPECIFIED 60-64    2470      NA      NA      NA UNSPECIFIED 60-64      2470        NA        NA
22 UNSPECIFIED   65+      NA      NA    3610      NA UNSPECIFIED   65+        NA        NA      3610
23 UNSPECIFIED  UNSP   10920      NA    5500    5700 UNSPECIFIED  UNSP     10920        NA        NA
   Year.13.1
18        NA
19     11720
20        NA
21        NA
22        NA
23      5700


推荐答案

使用 reshape2 dplyr 包(同一作者btw = P):

An alternative using reshape2 and dplyr packages (both of the same author, btw=P):

df <- structure(...) # structure of your question

现在,融合和演员:

df %>% 
  select(-Gender.1,-Age.1) %>% 
  melt %>% # melting data.frame. See the results until here to better understanding.
  dcast(Gender + Age ~ variable, sum, na.rm=TRUE, margins = "Age")

发生了什么事,一行一行:

What's going on, line by line:


  1. 你的data.frame

  2. 删除重复的列

  3. fusion data.frame。查看结果,直到这里才能更好地理解。

  4. 聚合由性别年龄变量(在这种情况下为几年)。汇总函数为 sum ,na.rm = TRUE。然后,根据需要, margin =Age将总计 Age

  1. your data.frame
  2. removing duplicated columns
  3. melting data.frame. See the results until here to better understanding.
  4. aggregate by Gender, Age and variable (Years, in this case). The aggregation function is sum with na.rm=TRUE. Then, margin = "Age" puts the totals by Age, as desired.

结果:

        Gender   Age Year.10 Year.11 Year.12 Year.13 Year.10.1 Year.11.1 Year.12.1 Year.13.1
1       FEMALE   3-9   46890   30150   18020   19060     41370     30150     18020     19060
2       FEMALE 10-19   58440   92870   50900   84110     52680     87280     50900     78440
3       FEMALE 40-59  484770  439860  454200  412650    460340    417800    447400    395700
4       FEMALE (all)  590100  562880  523120  515820    554390    535230    516320    493200
5         MALE   0-2   16900   22380    2260    5860     16900     22380      2260      5860
...
17 UNSPECIFIED  UNSP   10920       0    5500    5700     10920         0         0      5700
18 UNSPECIFIED (all)   26240    2150   14670    5700     26240         0      9170      5700 

希望有帮助。

这篇关于使用dplyr添加汇总行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆