使用总和而不是计数绘制合并的数据 [英] Plotting binned data using sum instead of count

查看:79
本文介绍了使用总和而不是计数绘制合并的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图寻找答案,但似乎找不到合适的答案.

I've tried to search for an answer, but can't seem to find the right one that does the job for me.

我有一个数据集(data),其中包含两个变量:人们的年龄(age)和获奖人数(awards)

I have a dataset (data) with two variables: people's ages (age) and number of awards (awards)

我的目标是根据R.FY中的年龄绘制奖励数量,一个人可以有多个奖项,而一个人可以具有相同的年龄.

My objective is to plot the number of awards against age in R. FYI, a person can have multiple awards and people can have the same age.

我试图绘制直方图和条形图,但是这样做的问题是,它计算观察次数而不是将奖励总数相加.

I tried to plot a histogram and barplot, but the problem with that is that it counts the number of observations instead of summing the number of awards.

样本数据集:

age <- c(21,22,22,25,30,34,45,26,37,46,49,21)
awards <- c(0,3,2,1,0,0,1,3,1,1,1,1)
data <- data.frame(cbind(age,awards))

我要寻找的是代表此数据的直方图(或条形图).

What I'm looking for is a histogram (or barplot) that represents this data.

理想情况下,我希望将年龄分为多个年龄段.例如, 20-30、31-40、41-50,然后是每组的总数.

Ideally, I'd want the ages to be split into age groups. For example, 20-30, 31-40, 41-50 and then the total number of awards for each group.

年龄组将在x轴上,每个年龄组的奖项总数将在y轴上.

The age group would be on the x-axis and the total number of awards for each age group would be on the y-axis.

谢谢!

推荐答案

我们可以使用aggregate函数,然后使用ggplot2包.这些天我在基地R上没有太多的巡逻员,所以我不确定在不加载ggplot2的情况下做到这一点的最佳方法:

We can use the aggregate function and then use the ggplot2 package. I don't make too many barplots in base R these days so I'm not sure of the best way to do it without loading ggplot2:

#data
set.seed(123)
dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
                  awards = rpois(200, 3))
head(dat)
  age awards
1  28      2
2  44      6
3  32      3
4  47      3
5  49      2
6  21      5

按年龄

#aggregate

sum_by_age <- aggregate(awards ~ age, data = dat, FUN = sum)

library(ggplot2)

ggplot(sum_by_age, aes(x = age, y = awards))+
    geom_bar(stat = 'identity')

#create groups

dat$age_group <- ifelse(dat$age <= 30, '20-30',
                        ifelse(dat$age <= 40, '30-40',
                               '41 +'))

sum_by_age_group <- aggregate(awards ~ age_group, data = dat, FUN = sum)

ggplot(sum_by_age_group, aes(x = age_group, y = awards))+
    geom_bar(stat = 'identity')

我们可以完全跳过aggregate步骤,而只需使用:

We could skip the aggregate step altogether and just use:

ggplot(dat, aes(x = age, y = awards)) + geom_bar(stat = 'identity')

但是我不喜欢这样,因为我认为在分析管道中进行中间数据步骤可能对可视化以外的比较有用.

but I don't prefer that way because I think having an intermediate data step may be useful within your analytical pipeline for comparisons other than visualizing.

这篇关于使用总和而不是计数绘制合并的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆