每组加权geom_density的密度总和为1 [英] Density of each group of weighted geom_density sum to one

查看:372
本文介绍了每组加权geom_density的密度总和为1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用加权数据对密度图进行分组,并将每组的密度加总为1? ggplot2 帮助 geom_density()表示使用加权数据的破解:除以权重的总和。但是,当分组时,这意味着组合的密度总计为1。我希望每个组的密度都是1。



我发现了两种笨拙的方法来做到这一点。首先是将每个组视为一个单独的数据集:

  m <-ggplot()
m + geom_density( data = movies [movies $ action == 0],aes(rating,weight = votes / sum(votes)),fill = NA,color =black)+
geom_density(data = movies [movies $ Action == 1,],aes(rating,weight = votes / sum(votes)),fill = NA,color =blue)

明显的缺点是手动处理因素水平和美观。我还尝试使用 data.table 包的窗口功能为每个Action组的总投票创建一个新列,并将其除以:

  movies.dt<  -  data.table(电影)
setkey(movies.dt,动作)
movies.dt [ ,票数。总组:=总数(票数),动作]
m < - ggplot(movies.dt,aes(x =评分,权重=选票/票数.per.group,group =动作,颜色= Action))
m + geom_density(fill = NA)

有没有更好的方法可以做到这一点?由于我的表的大小,我宁愿不通过它们的权重来复制行,以便于使用频率。 解决方案

我认为辅助桌可能是您唯一的选择。我在此处有类似的问题。看起来这个问题是,当 ggplot 使用 aes(...)中的聚合函数时,它将它们应用于整个数据集,而不是子集数据。所以,当你写下

  aes(weight = votes / sum(votes))

在分子中的 votes 是基于 Action ,但分母 sum(votes)中的投票不是。



如果其他人有办法解决这个问题,我很乐意听到它。


How can I group a density plot and have the density of each group sum to one, when using weighted data?

The ggplot2 help for geom_density() suggests a hack for using weighted data: dividing by the sum of the weights. But when grouped, this means that the combined density of the groups totals one. I would like the density of each group to total one.

I have found two clumsy ways to do this. The first is to treat each group as a separate dataset:

m <- ggplot()
m + geom_density(data = movies[movies$Action == 0, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="black") +
    geom_density(data = movies[movies$Action == 1, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="blue")

Obvious disadvantages are the manual handling of factor levels and aesthetics. I also tried using the windowing functionality of the data.table package to create a new column for the total votes per Action group, dividing by that instead:

movies.dt <- data.table(movies)
setkey(movies.dt, Action)
movies.dt[, votes.per.group := sum(votes), Action]
m <- ggplot(movies.dt, aes(x=rating, weight=votes/votes.per.group, group = Action, colour = Action))
m + geom_density(fill=NA)

Are there neater ways to do this? Because of the size of my tables, I'd rather not replicate rows by their weighting for the sake of using frequency.

解决方案

I think an auxillary table might be your only option. I had a similar problem here. The issue it seems is that, when ggplot uses aggregating functions in aes(...), it applies them to the whole dataset, not the subsetted data. So when you write

aes(weight=votes/sum(votes))

the votes in the numerator is subsetted based on Action, but votes in the denominator, sum(votes), is not. The same is true for the implicit grouping with facets.

If someone else has a way around this I'd love to hear it.

这篇关于每组加权geom_density的密度总和为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆