在R中控制ggplot2中的点的顺序? [英] controlling order of points in ggplot2 in R?

查看:127
本文介绍了在R中控制ggplot2中的点的顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在R中绘制ggplot2中的密集散点图,其中每个点可以用不同的颜色标记:

  df < -  data.frame(x = rnorm(500))
df $ y = rnorm(500)* 0.1 + df $ x
df $ label< -c(a)
df $ label [50] < - point
df $ size < - 2
ggplot(df)+ geom_point(aes(x = x,y = y,color = label,size = size))

当我这样做时,散点标记为point(绿色)绘制在标有a的红点上。什么控制这个ggplot中的z排序,即什么控制哪个点位于哪个点上?例如,如果我希望所有a点都位于标记为点的所有点之上(意味着它们有时会部分或完全隐藏该点),该怎么办?这是否依赖于标签的字母数字顺序?我想找到一个可以轻松转换为rpy2的解决方案。谢谢

解决方案

ggplot2 每层,绘图顺序由 geom 类型定义。默认情况下是按照它们出现在 data 中的顺序绘制。



在不同的情况下,它是指出。例如


geom_line








geom_path



以数据顺序连接观察值






还有关于因素 排序的已知问题,有趣的是要注意包作者Hadley的回应


绘图的显示应该与数据框的顺序不变 - 其他任何都是错误。






记住这个引用,图层按照指定的顺序绘制,所以重叠绘图可能是一个问题,特别是在创建密集散点图时。所以如果你想要一个一致的情节(而不是依赖于数据框中的顺序的情节),你需要多想一想。






创建第二层



如果您希望某些值出现在其他值之上,您可以使用子集参数来创建第二层,以后肯定会绘制出来。您将需要显式地加载 plyr 包,以便。()可以工作。

  set.seed(1234)
df< - data.frame(x = rnorm(500))
df $ y = rnorm (500)* 0.1 + df $ x
df $ label< -c(a)
df $ label [50]< - point
df $ size< - 2
library(plyr)
ggplot(df)+ geom_point(aes(x = x,y = y,color = label,size = size))+
geom_point(aes(x = x,y = y,color = label,size = size),
subset =。(label =='point'))



更新



ggplot2_2.0.0 中,<$弃用c $ c> subset 参数。使用例如 base :: subset 来选择 data 参数中指定的相关数据。并且无需加载 plyr

  ggplot(df)+ 
geom_point(aes(x = x,y = y,color = label,size = size))+
geom_point(data = subset(df,label =='point'),
aes (x = x,y = y,color = label,size = size))






或使用 alpha



另一种避免重叠绘图问题的方法是设置点的 alpha (透明度)。这不会像上面显式的第二层方法那样有效,但是,通过明智地使用 scale_alpha_manual ,您应该能够得到某些工作。



eg

 #set alpha = 1(no transparency)for your point(s)of利息
#和低值否则
ggplot(df)+ geom_point(aes(x = x,y = y,color = label,size = size,alpha = label))+
scale_alpha_manual(guide ='none',values = list(a = 0.2,point = 1))


Suppose I'm plotting a dense scatter plot in ggplot2 in R where each point might be labeled by a different color:

df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size))

When I do this, the scatter point labeled "point" (green) is plotted on top of the red points which have the label "a". What controls this z ordering in ggplot, i.e. what controls which point is on top of which? For example, what if I wanted all the "a" points to be on top of all the points labeled "point" (meaning they would sometimes partially or fully hide that point)? Does this depend on alphanumerical ordering of labels? I'd like to find a solution that can be translated easily to rpy2. thanks

解决方案

ggplot2 will create plots layer-by-layer and within each layer, the plotting order is defined by the geom type. The default is to plot in the order that they appear in the data.

Where this is different, it is noted. For example

geom_line

Connect observations, ordered by x value.

and

geom_path

Connect observations in data order


There are also known issues regarding the ordering of factors, and it is interesting to note the response of the package author Hadley

The display of a plot should be invariant to the order of the data frame - anything else is a bug.


This quote in mind, a layer is drawn in the specified order, so overplotting can be an issue, especially when creating dense scatter plots. So if you want a consistent plot (and not one that relies on the order in the data frame) you need to think a bit more.


Create a second layer

If you want certain values to appear above other values, you can use the subset argument to create a second layer to definitely be drawn afterwards. You will need to explicitly load the plyr package so .() will work.

set.seed(1234)
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
library(plyr)
ggplot(df) + geom_point(aes(x = x, y = y, color = label, size = size)) +
  geom_point(aes(x = x, y = y, color = label, size = size), 
             subset = .(label == 'point'))

Update

In ggplot2_2.0.0, the subset argument is deprecated. Use e.g. base::subset to select relevant data specified in the data argument. And no need to load plyr:

ggplot(df) +
  geom_point(aes(x = x, y = y, color = label,  size = size)) +
  geom_point(data = subset(df, label == 'point'),
             aes(x = x, y = y, color = label, size = size))


Or use alpha

Another approach to avoid the problem of overplotting would be to set the alpha (transparancy) of the points. This will not be as effective as the explicit second layer approach above, however, with judicious use of scale_alpha_manual you should be able to get something to work.

eg

# set alpha = 1 (no transparency) for your point(s) of interest
# and a low value otherwise
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size,alpha = label)) + 
  scale_alpha_manual(guide='none', values = list(a = 0.2, point = 1))

这篇关于在R中控制ggplot2中的点的顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆