使用dplyr添加汇总行 [英] Using dplyr to add summary rows
问题描述
Gender Year.10 Year.11 Year.12 Year.13 Year.10.1 Year.11.1 Year.12.1 Year.13.1
1 FEMALE 1181980 1113480 1040960 1033150 1116220 1059850 1022950 974490
2 MALE 674020 783150 571170 594330 641620 767590 554290 563670
3 UNSPECIFIED 31930 7740 14670 17420 31930 5590 9170 17420
我已经总结了我的数据,并希望在我的data.frame中将这些金额添加到他们各自的性别下面。所以,例如在我的女性行后,我想添加我的女性金额在它下面(与年龄==所有)
我认为我可以使用dplyr完成这个,但我不知道如何选择职位数据框架。
I have summed up my data and want to add these sums to underneath their respective Genders in my data.frame. So for instance after my female rows, I want to add my female sums underneath it (with Age == "All") I think I can accomplish this using dplyr, but I don't know how to select positions in the data.frame.
我认为这必须是以下一些:
I think it would have to be something along the lines of:
dummy.c %>%
group_by(Gender) %>%
rowwise () %>%
mutate () #### It's here where I don't know where to go on from
我用 summarise_each(funs(sum = sum (。,na.rm = TRUE)),starts_with(Year))
生成我的摘要,我可能要把它放在我的mutate()中。
I used summarise_each(funs(sum=sum(., na.rm=TRUE)), starts_with("Year"))
to generate my summary, I probably have to put that in my mutate ().
> dput(dummy.c[sample(1:nrow(dummy.c), 15, replace=FALSE),])
structure(list(Gender = structure(c(1L, 1L, 2L, 2L, 3L, 2L, 2L,
2L, 3L, 3L, 1L, 2L, 2L, 3L, 3L), .Label = c("FEMALE", "MALE",
"UNSPECIFIED"), class = "factor"), Age = structure(c(5L, 3L,
3L, 4L, 7L, 6L, 5L, 8L, 6L, 8L, 2L, 7L, 1L, 5L, 2L), .Label = c("0-2",
"3-9", "10-19", "20-39", "40-59", "60-64", "65+", "UNSP"), class = "factor"),
Year.10 = c(484770L, 58440L, 58570L, 200290L, NA, 54780L,
238600L, NA, 2470L, 10920L, 46890L, 63360L, 16900L, 12850L,
NA), Year.11 = c(439860L, 92870L, 60060L, 264280L, NA, 54400L,
258820L, NA, NA, NA, 30150L, 84750L, 22380L, NA, 2150L),
Year.12 = c(454200L, 50900L, 55600L, 230460L, 3610L, 47960L,
148530L, NA, NA, 5500L, 18020L, 64810L, 2260L, 5560L, NA),
Year.13 = c(412650L, 84110L, 38000L, 205600L, NA, 40600L,
185770L, 5670L, NA, 5700L, 19060L, 79150L, 5860L, NA, NA),
Gender.1 = structure(c(1L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 3L,
3L, 1L, 2L, 2L, 3L, 3L), .Label = c("FEMALE", "MALE", "UNSPECIFIED"
), class = "factor"), Age.1 = structure(c(5L, 3L, 3L, 4L,
7L, 6L, 5L, 8L, 6L, 8L, 2L, 7L, 1L, 5L, 2L), .Label = c("0-2",
"3-9", "10-19", "20-39", "40-59", "60-64", "65+", "UNSP"), class = c("ordered",
"factor")), Year.10.1 = c(460340L, 52680L, 58570L, 197110L,
NA, 54780L, 226110L, NA, 2470L, 10920L, 41370L, 52240L, 16900L,
12850L, NA), Year.11.1 = c(417800L, 87280L, 60060L, 264280L,
NA, 54400L, 248810L, NA, NA, NA, 30150L, 79200L, 22380L,
NA, NA), Year.12.1 = c(447400L, 50900L, 50100L, 224700L,
3610L, 47960L, 148530L, NA, NA, NA, 18020L, 59190L, 2260L,
5560L, NA), Year.13.1 = c(395700L, 78440L, 36520L, 200050L,
NA, 37360L, 185770L, 5670L, NA, 5700L, 19060L, 62130L, 5860L,
NA, NA)), .Names = c("Gender", "Age", "Year.10", "Year.11",
"Year.12", "Year.13", "Gender.1", "Age.1", "Year.10.1", "Year.11.1",
"Year.12.1", "Year.13.1"), row.names = c(5L, 3L, 11L, 12L, 22L,
14L, 13L, 16L, 21L, 23L, 2L, 15L, 9L, 20L, 17L), class = "data.frame")
编辑1:显示数据
>头(dummy.c)
Gender Age Year.10 Year.11 Year.12 Year.13 Gender.1 Age.1 Year.10.1 Year.11.1 Year.12.1 Year.13.1
1 FEMALE 0-2 13700 2470 7820 2100 FEMALE 0-2 13700 2470 7820 2100
2 FEMALE 3-9 46890 30150 18020 19060 FEMALE 3-9 41370 30150 18020 19060
3 FEMALE 10-19 58440 92870 50900 84110 FEMALE 10-19 52680 87280 50900 78440
4 FEMALE 20-39 380610 387080 291930 371290 FEMALE 20-39 356070 372370 280720 356500
5 FEMALE 40-59 484770 439860 454200 412650 FEMALE 40-59 460340 417800 447400 395700
6 FEMALE 60-64 80090 76670 92750 60710 FEMALE 60-64 80090 76670 92750 49240`
tail(dummy.c)
Gender Age Year.10 Year.11 Year.12 Year.13 Gender.1 Age.1 Year.10.1 Year.11.1 Year.12.1
18 UNSPECIFIED 10-19 5690 NA NA NA UNSPECIFIED 10-19 5690 NA NA
19 UNSPECIFIED 20-39 NA 5590 NA 11720 UNSPECIFIED 20-39 NA 5590 NA
20 UNSPECIFIED 40-59 12850 NA 5560 NA UNSPECIFIED 40-59 12850 NA 5560
21 UNSPECIFIED 60-64 2470 NA NA NA UNSPECIFIED 60-64 2470 NA NA
22 UNSPECIFIED 65+ NA NA 3610 NA UNSPECIFIED 65+ NA NA 3610
23 UNSPECIFIED UNSP 10920 NA 5500 5700 UNSPECIFIED UNSP 10920 NA NA
Year.13.1
18 NA
19 11720
20 NA
21 NA
22 NA
23 5700
推荐答案
使用 reshape2
和 dplyr
包(同一作者btw = P):
An alternative using reshape2
and dplyr
packages (both of the same author, btw=P):
df <- structure(...) # structure of your question
现在,融合和演员:
df %>%
select(-Gender.1,-Age.1) %>%
melt %>% # melting data.frame. See the results until here to better understanding.
dcast(Gender + Age ~ variable, sum, na.rm=TRUE, margins = "Age")
发生了什么事,一行一行:
What's going on, line by line:
- 你的data.frame
- 删除重复的列
- fusion data.frame。查看结果,直到这里才能更好地理解。
- 聚合由
性别
,年龄
和变量
(在这种情况下为几年)。汇总函数为sum
,na.rm = TRUE。然后,根据需要,margin =Age
将总计Age
- your data.frame
- removing duplicated columns
- melting data.frame. See the results until here to better understanding.
- aggregate by
Gender
,Age
andvariable
(Years, in this case). The aggregation function issum
with na.rm=TRUE. Then,margin = "Age"
puts the totals byAge
, as desired.
结果:
Gender Age Year.10 Year.11 Year.12 Year.13 Year.10.1 Year.11.1 Year.12.1 Year.13.1
1 FEMALE 3-9 46890 30150 18020 19060 41370 30150 18020 19060
2 FEMALE 10-19 58440 92870 50900 84110 52680 87280 50900 78440
3 FEMALE 40-59 484770 439860 454200 412650 460340 417800 447400 395700
4 FEMALE (all) 590100 562880 523120 515820 554390 535230 516320 493200
5 MALE 0-2 16900 22380 2260 5860 16900 22380 2260 5860
...
17 UNSPECIFIED UNSP 10920 0 5500 5700 10920 0 0 5700
18 UNSPECIFIED (all) 26240 2150 14670 5700 26240 0 9170 5700
希望有帮助。
这篇关于使用dplyr添加汇总行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!