控制ggplot2中点的顺序? [英] Controlling the order of points in ggplot2?
问题描述
我正在 ggplot2 中绘制一个密集散点图,其中每个点可能用不同的颜色标记:
I'm plotting a dense scatter plot in ggplot2 where each point might be labeled by a different color:
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size))
当我这样做时,标记为点"的散点将消失.(绿色)绘制在带有标签a"的红点之上.什么控制了 ggplot 中的 z 排序,即什么控制了哪个点在哪个点之上?
When I do this, the scatter point labeled "point" (green) is plotted on top of the red points which have the label "a". What controls this z ordering in ggplot, i.e. what controls which point is on top of which?
例如,如果我想要所有的a"怎么办?点位于所有标记为点"的点之上(意味着他们有时会部分或完全隐藏这一点)?这是否取决于标签的字母数字顺序?
For example, what if I wanted all the "a" points to be on top of all the points labeled "point" (meaning they would sometimes partially or fully hide that point)? Does this depend on alphanumerical ordering of labels?
我想找到一个可以轻松转换为 rpy2 的解决方案.
I'd like to find a solution that can be translated easily to rpy2.
推荐答案
ggplot2
将逐层创建绘图,在每一层内,绘图顺序由 geom代码> 类型.默认是按照它们在
data
中出现的顺序绘制.
ggplot2
will create plots layer-by-layer and within each layer, the plotting order is defined by the geom
type. The default is to plot in the order that they appear in the data
.
哪里不同,请注明.例如
Where this is different, it is noted. For example
连接观察值,按 x 值排序.
geom_line
Connect observations, ordered by x value.
和
按数据顺序连接观察
还有关于因子
的排序的已知问题,并且值得注意的是包作者 Hadley 的回复
There are also known issues regarding the ordering of factors
, and it is interesting to note the response of the package author Hadley
绘图的显示应该与数据框的顺序保持不变 - 其他任何东西都是错误.
The display of a plot should be invariant to the order of the data frame - anything else is a bug.
记住这句话,图层是按照指定的顺序绘制的,因此过度绘制可能是一个问题,尤其是在创建密集散点图时.因此,如果您想要一个一致的图(而不是依赖于数据框中顺序的图),您需要多考虑一下.
This quote in mind, a layer is drawn in the specified order, so overplotting can be an issue, especially when creating dense scatter plots. So if you want a consistent plot (and not one that relies on the order in the data frame) you need to think a bit more.
如果您希望某些值出现在其他值之上,您可以使用 subset
参数来创建第二个图层,以便在之后绘制.您需要显式加载 plyr
包,以便 .()
工作.
If you want certain values to appear above other values, you can use the subset
argument to create a second layer to definitely be drawn afterwards. You will need to explicitly load the plyr
package so .()
will work.
set.seed(1234)
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
library(plyr)
ggplot(df) + geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(aes(x = x, y = y, color = label, size = size),
subset = .(label == 'point'))
在 ggplot2_2.0.0
中,不推荐使用 subset
参数.使用例如base::subset
选择在 data
参数中指定的相关数据.并且不需要加载 plyr
:
In ggplot2_2.0.0
, the subset
argument is deprecated. Use e.g. base::subset
to select relevant data specified in the data
argument. And no need to load plyr
:
ggplot(df) +
geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(data = subset(df, label == 'point'),
aes(x = x, y = y, color = label, size = size))
或者使用alpha
避免过度绘图问题的另一种方法是设置点的 alpha
(透明度).这不会像上面明确的第二层方法那样有效,但是,通过明智地使用 scale_alpha_manual
,你应该能够得到一些工作.
Or use alpha
Another approach to avoid the problem of overplotting would be to set the alpha
(transparancy) of the points. This will not be as effective as the explicit second layer approach above, however, with judicious use of scale_alpha_manual
you should be able to get something to work.
例如
# set alpha = 1 (no transparency) for your point(s) of interest
# and a low value otherwise
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size,alpha = label)) +
scale_alpha_manual(guide='none', values = list(a = 0.2, point = 1))
这篇关于控制ggplot2中点的顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!