用数据框中的另一列标记x轴 [英] Labeling x-axis with another column from dataframe
问题描述
我有一个数据帧,它是从正在运行的GWAS的输出中得出的.每行都是基因组中的SNP,具有染色体,位置和P.value.从这个数据帧中,我想生成一个曼哈顿图,其中x轴从Chr 1上的第一个SNP到Chr 5上的最后一个SNP,而y轴是-log10(P.value).为此,我生成了一个Index列以沿x轴以正确的顺序绘制SNP,但是,我希望x轴由染色体列而不是Index标记.不幸的是,我无法使用染色体来绘制我的x轴,因为任何给定染色体上的所有SNP都将被绘制在单列点中.
I have a dataframe derived from the output of running GWAS. Each row is a SNP in the genome, with its Chromosome, Position, and P.value. From this dataframe, I'd like to generate a Manhattan Plot where the x-axis goes from the first SNP on Chr 1 to the last SNP on Chr 5 and the y-axis is the -log10(P.value). To do this, I generated an Index column to plot the SNPs in the correct order along the x-axis, however, I would like the x-axis to be labeled by the Chromosome column instead of the Index. Unfortunately, I cannot use Chromosome to plot my x-axis because then all the SNPs on any given Chromosome would be plotted in a single column of points.
以下是可与之配合使用的示例数据框:
Here is an example dataframe to work with:
library(tidyverse)
df <- tibble(Index = seq(1, 500, by = 1),
Chromosome = rep(seq(1, 5, by = 1), each = 100),
Position = rep(seq(1, 500, by = 5), 5),
P.value = sample(seq(1e-5, 1e-2, by = 1e-5), 500, replace = TRUE))
还有我到目前为止的情节:
And the plot that I have so far:
df %>%
ggplot(aes(x = Index, y = -log10(P.value), color = as.factor(Chromosome))) +
geom_point()
我尝试使用scale_x_discrete选项,但无法找到解决方案.
I have tried playing around with the scale_x_discrete option, but haven't been able to figure out a solution.
这是我在网上找到的曼哈顿情节的一个例子.看看x轴是如何根据染色体进行标记的?那是我想要的输出.
Here is an example of a Manhattan Plot I found online. See how the x-axis is labeled according to the Chromosome? That is my desired output.
推荐答案
geom_jitter
是你的朋友:
df %>%
ggplot(aes(x = Chromosome, y = -log10(P.value), color = as.factor(Chromosome))) +
geom_jitter()
编辑给定OP的评论:
使用R底图,您可以执行以下操作:
Using base R plot, you could do:
cols = sample(colors(), length(unique(df$Chromosome)))[df$Chromosome]
plot(df$Index, -log10(df$P.value), col=cols, xaxt="n")
axis(1, at=c(50, 150, 250, 350, 450), labels=c(1:5))
您需要为 axis
函数准确指定每个染色体标签的位置.感谢这篇帖子.
You'll need to specify exactly where you want each chromosome label to be for the axis
function. Thanks to this post.
编辑#2:
我使用 ggplot2
找到了答案.您可以使用 annotate
函数按坐标绘制点,并使用 scale_x_discrete
函数(如您建议的那样)根据染色体将标签放置在x轴上.我们还需要定义 pos
向量,以获取图的标签位置.我以每个组的 Index
列的平均值为例,但是您可以根据需要手动定义它.
I found an answer using ggplot2
. You can use the annotate
function to plot your points by coordinates, and the scale_x_discrete
function (as you suggested) to place the labels in the x axis according to chromosome. We also need to define the pos
vector to get the position of labels for the plot. I used the mean value of the Index
column for each group as an example, but you can define it by hand if you wish.
pos <- df %>%
group_by(Chromosome) %>%
summarize(avg = round(mean(Index))) %>%
pull(avg)
ggplot(df) +
annotate("point", x=df$Index, y=-log10(df$P.value),
color=as.factor(df$Chromosome)) +
scale_x_discrete(limits = pos,
labels = unique(df$Chromosome))
这篇关于用数据框中的另一列标记x轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!