按观察次数过滤ggplot2密度图 [英] Filter ggplot2 density plot by number of observations

查看:74
本文介绍了按观察次数过滤ggplot2密度图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在ggplot2调用中过滤掉观察次数较少的数据子集?

Is it possible to filter out subsets of the data that have small numbers of observations within a ggplot2 call?

例如,采取以下图解: qplot(price,data = diamonds,geom ="density",colour = cut)

For example, take the following plot: qplot(price,data=diamonds,geom="density",colour=cut)

情节有点忙,我希望排除观察值少的 cut 值,即

The plot is a little busy, and I would like the exclude the cut values with a small number of observations, ie,

> xtabs(~cut,diamonds)
cut
     Fair      Good Very Good   Premium     Ideal 
     1610      4906     12082     13791     21551

cut 因子的 Fair Good 品质.

我想要一个可以适合任意数据集的解决方案,并且如果可能的话,不仅可以根据观察值的阈值进行选择,还可以例如按前3名进行选择.

I'm wanting a solution that can fit an arbitrary data set and if possible be able to select not just by a threshold number of observations, but by top 3 for example.

推荐答案

ggplot(subset(diamonds, cut %in% arrange(count(diamonds, .(cut)), desc(freq))[1:3,]$cut),
  aes(price, colour=cut)) + 
  geom_density() + facet_grid(~cut)

  1. count 将每个元素累加到data.frame中.
  2. arrange 根据指定的列对data.frame进行排序.
  3. desc 启用逆序排序.
  4. 最后将其剪切的行以%in%包含在前3个行中.
  1. count counts up each elements into data.frame.
  2. arrange orders a data.frame based on the specified column.
  3. desc enables reversed-order sorting.
  4. finally subset the rows whose cut is included in the top 3 by %in%.

这篇关于按观察次数过滤ggplot2密度图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆