在ggplot2散点图中使用伪彩色指示密度 [英] Using pseudocolour in ggplot2 scatter plot to indicate density

查看:138
本文介绍了在ggplot2散点图中使用伪彩色指示密度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道如何创建类似于屏幕截图中的图形的图形吗?我试图获得类似的调整alpha效果,但是这使异常值几乎不可见.我仅从名为FlowJo的软件知道这种类型的图,在这里他们将其称为伪彩色点图".不确定这是否是正式术语.

Does someone know how to create a graph like the one in the screenshot? I've tried to get a similar effect adjusting alpha, but this renders outliers to be almost invisible. I know this type of graph only from a software called FlowJo, here they refer to it as "pseudocolored dot plot". Not sure if this an official term.

我想在ggplot2中专门执行此操作,因为我需要使用faceting选项.我附上了我的一张图表的另一个屏幕截图.垂直线描绘了在某些基因组区域的突变簇.其中一些群集比其他群集密集得多.我想用密度色来说明这一点.

I'd like to do it specifically in ggplot2, as I need the faceting option. I attached another screenshot of one of my graphs. The vertical lines depict clusters of mutations at certain genomic regions. Some of these clusters are much denser than others. I'd like to illustrate this using the density colors.

数据很大,很难模拟,但是可以尝试一下.我看起来不像实际数据,但是数据格式是相同的.

The data is quite big and hard to simulate, but here's a try. I doesn't look like the actual data, but the data format is the same.

chr <- c(rep(1:10,1000))
position <- runif(10000, min=0, max=5e8)
distance <- runif(10000, min=1, max=1e5)
log10dist <- log10(distance)

df1 <- data.frame(chr, position, distance, log10dist)

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25, alpha=0.5, show.legend = FALSE) +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x")

我们非常感谢您的帮助.

Any help is highly appreciated.

推荐答案

library(ggplot2)
library(ggalt)
library(viridis)

chr <- c(rep(1:10,1000))
position <- runif(10000, min=0, max=5e8)
distance <- runif(10000, min=1, max=1e5)
log10dist <- log10(distance)

df1 <- data.frame(chr, position, distance, log10dist)

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25, show.legend = FALSE) +
  stat_bkde2d(aes(fill=..level..), geom="polygon") +
  scale_fill_viridis() +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x")

在实践中,我会先进行初始带宽猜测,然后找出最佳带宽.除了采用惰性方法并仅绘制点且不进行过滤(smoothScatter()过滤除基于npoints的异常值之外的所有内容)之外,这还会生成平滑的散点图",就像您发布的示例一样.

In practice, I'd take the initial bandwidth guess and then figure out an optimal bandwidth. Apart from taking the lazy approach and just plotting the points w/o filtering (smoothScatter() filters everything but the outliers based on npoints) this is generating the "smoothed scatterplot" like the example you posted.

smoothScatter()使用不同的默认值,所以结果有些不同:

smoothScatter() uses different defaults, so it comes out a bit differently:

par(mfrow=c(nr=2, nc=5))
for (chr in unique(df1$chr)) {
  plt_df <- dplyr::filter(df1, chr==chr)
  smoothScatter(df1$position, df1$log10dist, colramp=viridis)
}

geom_hex()将显示异常值,但不会显示为不同点:

geom_hex() is going to show the outliers, but not as distinct points:

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25, show.legend = FALSE, color="red") +
  scale_fill_viridis() +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x")

此:

ggplot(df1, aes(position, log10dist)) + 
  geom_point(shape=16, size=0.25) +
  stat_bkde2d(bandwidth=c(18036446, 0.05014539), 
              grid_size=c(128, 128), geom="polygon", aes(fill=..level..)) +
  scale_y_continuous(limits=c(3.5, 5.1)) +
  scale_fill_viridis() +
  facet_wrap(~chr, ncol = 5, nrow = 2, scales = "free_x") +
  theme_bw() +
  theme(panel.grid=element_blank())

使您非常接近smoothScatter()所使用的默认值,但是仅通过限制y轴限制,就很难实现smoothScatter()函数中nrpoints过滤代码的大部分功能.

gets you very close to the defaults smoothScatter() uses, but hackishly accomplishes most of what the nrpoints filtering code does in the smoothScatter() function solely by restricting the y axis limits.

这篇关于在ggplot2散点图中使用伪彩色指示密度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆