标记ggplot2中的特定点 [英] label specific point in ggplot2

查看:178
本文介绍了标记ggplot2中的特定点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在ggplot2中的特定散点图中标记各个关注点.我的数据以具有多个列的csv文件的形式存在.

I'm trying to label individual points of interest in a specific scatter plot in ggplot2. My data exists as a csv file with multiple columns.

Gene       chr    start    stop      A      B       C       D      E
APOBEC3G   chr22  39472992 39483773  97.06  214.56  102.34  20.00  19.45  
APOBEC3C ... 

等等,依此类推.我正在尝试通过ggplot绘制A列对B列,并且我成功了,并且可以用相应的基因名称标记所有点.但是,如何突出显示(即颜色,大小变化)感兴趣的单个基因?(又名:如何确定我手头有的10个基因的列表的数据点?或者如何在散点图中对感兴趣的基因进行注释而又不对所有其他点进行注释?)

And so on and so forth. I am trying to plot column A v. column B via ggplot and I'm successful and can label all of the points with the corresponding gene name. However, how do I highlight (i.e. color, size change) individual genes of interest? (AKA: How do I make the data point for a list of 10 genes that I have on hand stand out? Or how can I annotate my genes of interest on the scatterplot without annotating all other points?)

我尝试使用 subset 函数,但是我在R的新手角色使我有些滞留.

I've tried using the subset function but my novice character at R has left me stranded a bit.

推荐答案

您需要创建一个新变量来区分要突出显示的观察结果.

You need to create a new variable that distinguishes the observations you want to highlight.

让我们模拟一个data.frame:

Let's simulate a data.frame :

df <- data.frame(genes=letters,
                 A=runif(26),
                 B=runif(26))

您当前的绘图应如下所示(点+标签):

Your current plot should look like this (point + labels):

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point() +
  geom_text(hjust=-1,vjust=1)

为了突出显示某些基因,我们创建了一个新变量group.我将重要"分配给一些任意基因.您可能希望通过编程来执行此操作,例如查找离群值.

In order to highlight some genes, we create a new variable, group. I assign "important" to some arbitrary genes. You may want to do this programatically, by looking for outliers for instance.

df$group <- "not important"
df$group[df$genes %in% c("d","g","b")] <- "important"

现在,有两种分离基因的方法.最特质的是给两组都使用一种颜色(或形状,大小等)(一种用于重要的基因,一种用于不重要的基因).通过将新变量映射到颜色(或大小,形状等)可以轻松实现:

Now, there are two ways to separate the genes. The most idiosyncratic is to give one colour (or shape, or size, etc.) to both groups (one for important genes, one for unimportant ones). This is easily achieved by mapping the new variable to colour (or size, shape, etc.):

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point(aes(color=group)) +
  geom_text(hjust=-1,vjust=1)

但是,您也可以将每个组绘制在单独的图层上.要清楚地突出重要的基因.在这种情况下,我们首先添加所有点,然后添加一个新的 geom_point ,该 geom_point 仅包含具有特殊属性(此处为颜色和大小)的重要基因.

However, you could also plot each group on a separate layer. To clearly highlight the important genes. In that case, we first add all points, and then add a new geom_point that contains only the important genes, with special attributes (here, color and size).

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point() +
  geom_point(data=df[df$group == "important",],color="red",size=3) +
  geom_text(hjust=-1,vjust=1)

这篇关于标记ggplot2中的特定点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆