颜色代码点基于ggplot中的百分位数 [英] Color code points based on percentile in ggplot

查看:222
本文介绍了颜色代码点基于ggplot中的百分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些非常大的文件,其中包含一个基因组的位置(位置)和相应的群体遗传统计(值)。我已成功绘制了这些值,并且希望为顶部5%(蓝色)和1%(红色)的值进行颜色编码。我想知道是否有一个简单的方法来做到这一点在R.





我已经研究过编写一个定义分位数的函数,但是,其中许多函数最终并不是唯一的,因此导致函数失败。我也研究过stat_quantile,但只使用它来绘制标记95%和99%的线(并且一些线对角线对我来说没有任何意义),但只有成功。(对不起,我是新来的R $)

任何帮助将不胜感激。



以下是我的代码:(文件非常大)
$ b $ pre $ ########合并来自多个文件的数据
fst < - rbind(data .frame(key =a1-a3,position = a1.3 $ V2,value = a1.3 $ V3),data.frame(key =a1-a2,position = a1.2 $ V2,value = a1.2 $ V3),data.frame(key =a2-a3,position = a2.3 $ V2,value = a2.3 $ V3),data.frame(key =b1-b2,position = b1.2 $ V2,value = b1.2 $ V3),data.frame(key =c1-c2,position = c1.2 $ V2,value = c1.2 $ V3))


########情节
theme_set(theme_bw(base_size = 16))

p1 < - ggplot(fst,aes(x = position ,y = value))+
geom_point()+
facet_wrap(〜key)+
ylab(Fst)+
xlab(Genomic Position(Mb)) +
scale_x_continuous(breaks = c(1e + 06,2e + 0) 6,3e + 06,4e + 06),labels = c(1,2,3,4))+
scale_y_continuous(limits = c(0,1))+
theme(plot.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
legend.position =none,
legend.title = element_blank()

p1


解决方案

这就是我如何处理它 - 基本上创建一个因素来定义每个观察结果所在的组,然后将 color 映射到该因子。 / b>

首先需要一些数据!

  dat < - data.frame(key = c(a1-a3,a1-a2),position = 1:100,value = rlnorm(200,0,1))
#获取分位数
quants < - 分位数(dat $ value,c(0.95,0.99))

有很多方法确定每个观察所属的组是哪一个因子,这里是一个:

  dat $ quant < -  with(dat ,因素(ifelse(v alue< quants [1],0,
ifelse(value

因此, quant 现在指示观察是否在95-99或99+组中。图中点的颜色可以很容易地映射到 quant

  ggplot(dat,aes(position,value))+ geom_point(aes(color = quant))+ facet_wrap(〜key)+ 
scale_colour_manual(values = c(black,blue红色),
labels = c(0-95,95-99,99-100))+ theme_bw()


I have some very large files that contain a genomic position (position) and a corresponding population genetic statistic (value). I have successfully plotted these values and would like to color code the top 5% (blue) and 1% (red) of values. I am wondering if there is an easy way to do this in R.

I have explored writing a function that defines the quantiles, however, many of them end up being not unique and thus cause the function to fail. I've also looked into stat_quantile but only had success in using this to plot a line marking the 95% and 99% (and some of the lines were diagonals which did not make any sense to me.) (Sorry, I am new to R.)

Any help would be much appreciated.

Here is my code: (The files are very large)

########Combine data from multiple files
fst <- rbind(data.frame(key="a1-a3", position=a1.3$V2, value=a1.3$V3), data.frame(key="a1-a2", position=a1.2$V2, value=a1.2$V3), data.frame(key="a2-a3", position=a2.3$V2, value=a2.3$V3), data.frame(key="b1-b2", position=b1.2$V2, value=b1.2$V3), data.frame(key="c1-c2", position=c1.2$V2, value=c1.2$V3))


########the plot
theme_set(theme_bw(base_size = 16))

p1 <- ggplot(fst, aes(x=position, y=value)) + 
  geom_point() + 
  facet_wrap(~key) +
  ylab("Fst") + 
  xlab("Genomic Position (Mb)") +
  scale_x_continuous(breaks=c(1e+06, 2e+06, 3e+06, 4e+06), labels=c("1", "2", "3", "4")) +
  scale_y_continuous(limits=c(0,1)) +
  theme(plot.background = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    legend.position="none",
    legend.title = element_blank()
    )
p1

解决方案

This is how I would approach it - basically creating a factor defining which group each observation is in, then mapping colour to that factor.

First, some data to work with!

dat <- data.frame(key = c("a1-a3", "a1-a2"), position = 1:100, value = rlnorm(200, 0, 1))
#Get quantiles
quants <- quantile(dat$value, c(0.95, 0.99))

There are plenty of ways of getting a factor to determine which group each observation falls into, here is one:

dat$quant  <- with(dat, factor(ifelse(value < quants[1], 0, 
                                  ifelse(value < quants[2], 1, 2))))

So quant now indicates whether an observation is in the 95-99 or 99+ group. The colour of the points in a plot can then easily be mapped to quant.

ggplot(dat, aes(position, value)) + geom_point(aes(colour = quant)) + facet_wrap(~key) +
  scale_colour_manual(values = c("black", "blue", "red"), 
                      labels = c("0-95", "95-99", "99-100")) + theme_bw()

这篇关于颜色代码点基于ggplot中的百分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆