使用总和而不是计数绘制合并的数据 [英] Plotting binned data using sum instead of count
问题描述
我试图寻找答案,但似乎找不到合适的答案.
I've tried to search for an answer, but can't seem to find the right one that does the job for me.
我有一个数据集(data
),其中包含两个变量:人们的年龄(age
)和获奖人数(awards
)
I have a dataset (data
) with two variables: people's ages (age
) and number of awards (awards
)
我的目标是根据R.FY中的年龄绘制奖励数量,一个人可以有多个奖项,而一个人可以具有相同的年龄.
My objective is to plot the number of awards against age in R. FYI, a person can have multiple awards and people can have the same age.
我试图绘制直方图和条形图,但是这样做的问题是,它计算观察次数而不是将奖励总数相加.
I tried to plot a histogram and barplot, but the problem with that is that it counts the number of observations instead of summing the number of awards.
样本数据集:
age <- c(21,22,22,25,30,34,45,26,37,46,49,21)
awards <- c(0,3,2,1,0,0,1,3,1,1,1,1)
data <- data.frame(cbind(age,awards))
我要寻找的是代表此数据的直方图(或条形图).
What I'm looking for is a histogram (or barplot) that represents this data.
理想情况下,我希望将年龄分为多个年龄段.例如, 20-30、31-40、41-50,然后是每组的总数.
Ideally, I'd want the ages to be split into age groups. For example, 20-30, 31-40, 41-50 and then the total number of awards for each group.
年龄组将在x轴上,每个年龄组的奖项总数将在y轴上.
The age group would be on the x-axis and the total number of awards for each age group would be on the y-axis.
谢谢!
推荐答案
我们可以使用aggregate
函数,然后使用ggplot2
包.这些天我在基地R
上没有太多的巡逻员,所以我不确定在不加载ggplot2
的情况下做到这一点的最佳方法:
We can use the aggregate
function and then use the ggplot2
package. I don't make too many barplots in base R
these days so I'm not sure of the best way to do it without loading ggplot2
:
#data
set.seed(123)
dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
awards = rpois(200, 3))
head(dat)
age awards
1 28 2
2 44 6
3 32 3
4 47 3
5 49 2
6 21 5
按年龄
#aggregate
sum_by_age <- aggregate(awards ~ age, data = dat, FUN = sum)
library(ggplot2)
ggplot(sum_by_age, aes(x = age, y = awards))+
geom_bar(stat = 'identity')
#create groups
dat$age_group <- ifelse(dat$age <= 30, '20-30',
ifelse(dat$age <= 40, '30-40',
'41 +'))
sum_by_age_group <- aggregate(awards ~ age_group, data = dat, FUN = sum)
ggplot(sum_by_age_group, aes(x = age_group, y = awards))+
geom_bar(stat = 'identity')
我们可以完全跳过aggregate
步骤,而只需使用:
We could skip the aggregate
step altogether and just use:
ggplot(dat, aes(x = age, y = awards)) + geom_bar(stat = 'identity')
但是我不喜欢这样,因为我认为在分析管道中进行中间数据步骤可能对可视化以外的比较有用.
but I don't prefer that way because I think having an intermediate data step may be useful within your analytical pipeline for comparisons other than visualizing.
这篇关于使用总和而不是计数绘制合并的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!