每组加权geom_density的密度总和为1 [英] Density of each group of weighted geom_density sum to one
问题描述
如何使用加权数据对密度图进行分组,并将每组的密度加总为1? ggplot2 帮助 geom_density()
表示使用加权数据的破解:除以权重的总和。但是,当分组时,这意味着组合的密度总计为1。我希望每个组的密度都是1。
我发现了两种笨拙的方法来做到这一点。首先是将每个组视为一个单独的数据集:
m <-ggplot()
m + geom_density( data = movies [movies $ action == 0],aes(rating,weight = votes / sum(votes)),fill = NA,color =black)+
geom_density(data = movies [movies $ Action == 1,],aes(rating,weight = votes / sum(votes)),fill = NA,color =blue)
明显的缺点是手动处理因素水平和美观。我还尝试使用 data.table
包的窗口功能为每个Action组的总投票创建一个新列,并将其除以:
movies.dt< - data.table(电影)
setkey(movies.dt,动作)
movies.dt [ ,票数。总组:=总数(票数),动作]
m < - ggplot(movies.dt,aes(x =评分,权重=选票/票数.per.group,group =动作,颜色= Action))
m + geom_density(fill = NA)
有没有更好的方法可以做到这一点?由于我的表的大小,我宁愿不通过它们的权重来复制行,以便于使用频率。 解决方案
我认为辅助桌可能是您唯一的选择。我在此处有类似的问题。看起来这个问题是,当 ggplot
使用 aes(...)
中的聚合函数时,它将它们应用于整个数据集,而不是子集数据。所以,当你写下
aes(weight = votes / sum(votes))
在分子中的 votes
是基于 Action
,但分母 sum(votes)
中的投票不是。
如果其他人有办法解决这个问题,我很乐意听到它。
How can I group a density plot and have the density of each group sum to one, when using weighted data?
The ggplot2
help for geom_density()
suggests a hack for using weighted data: dividing by the sum of the weights. But when grouped, this means that the combined density of the groups totals one. I would like the density of each group to total one.
I have found two clumsy ways to do this. The first is to treat each group as a separate dataset:
m <- ggplot()
m + geom_density(data = movies[movies$Action == 0, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="black") +
geom_density(data = movies[movies$Action == 1, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="blue")
Obvious disadvantages are the manual handling of factor levels and aesthetics. I also tried using the windowing functionality of the data.table
package to create a new column for the total votes per Action group, dividing by that instead:
movies.dt <- data.table(movies)
setkey(movies.dt, Action)
movies.dt[, votes.per.group := sum(votes), Action]
m <- ggplot(movies.dt, aes(x=rating, weight=votes/votes.per.group, group = Action, colour = Action))
m + geom_density(fill=NA)
Are there neater ways to do this? Because of the size of my tables, I'd rather not replicate rows by their weighting for the sake of using frequency.
I think an auxillary table might be your only option. I had a similar problem here. The issue it seems is that, when ggplot
uses aggregating functions in aes(...)
, it applies them to the whole dataset, not the subsetted data. So when you write
aes(weight=votes/sum(votes))
the votes
in the numerator is subsetted based on Action
, but votes in the denominator, sum(votes)
, is not. The same is true for the implicit grouping with facets.
If someone else has a way around this I'd love to hear it.
这篇关于每组加权geom_density的密度总和为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!