有什么方法可以为数据表绘制UMAP或t-SNE图? [英] Is there any way to draw UMAP or t-SNE plot for data table?

查看:330
本文介绍了有什么方法可以为数据表绘制UMAP或t-SNE图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的文件(下面是一小组数据),如下所示,我想绘制一个PCA,我可以使用PCA函数绘制PCA,但它看起来有些混乱,因为我有200列,所以我认为也许t-SNE或UMAP效果更好,但我无法使用它们进行绘制.

I have a huge file (below is a small set of data) like below, I would like to draw a PCA, I could draw PCA using PCA function but it looks a bit messy, because I have 200 columns so I think maybe t-SNE or UMAP works better, but I couldn't draw using them.

我想显示图中列(列名)之间的关系和聚类.实际上,我从不同的研究中收集了A,B和...的数据,我想检查一下它们之间是否存在任何批量效应.

I would like to show the relation and clustering between columns (column name) in a plot. In fact, I collected A, B and ...data from different studies and I like to check is there any batch effect between them or not.

如果有人可以帮助我,将不胜感激!

It would be appreciated if anyone can help me!

DF:

                            A              B             C           D
1:540450-541070    0.12495878     0.71580434    0.65399319  1.04879290
1:546500-548198    0.41064192     0.26136554    0.11939805  0.28721360
1:566726-567392    0.00000000     0.06663644    0.45661687  0.24408844
1:569158-570283    0.34433086     0.27614141    0.54063437  0.21675053
1:603298-605500    0.07036734     0.42324126    0.23017472  0.29530045
1:667800-669700    0.20388011     0.11678913    0.00000000  0.12833913
1:713575-713660    7.29171225     12.53078648   2.38515165  3.82500941
1:724497-727160    0.40730086     0.26664585    0.45678834  0.12209005
1:729399-731900    0.74345727     0.49685579    0.72956458  0.32499580

推荐答案

以下是使用虹膜数据集的一些示例,因为示例数据对于降维而言有些过小.

Here are some examples using the iris dataset, since your example data is somewhat too small for the dimensionality reductions.

对于tSNE:

library(ggplot2)
library(Rtsne)

dat <- iris

tsne <- Rtsne(dat[!duplicated(dat), -5])

df <- data.frame(x = tsne$Y[,1],
                 y = tsne$Y[,2],
                 Species = dat[!duplicated(dat), 5])

ggplot(df, aes(x, y, colour = Species)) +
  geom_point()

对于UMAP:

library(umap)
umap <- umap(dat[!duplicated(dat), -5])

df <- data.frame(x = umap$layout[,1],
                 y = umap$layout[,2],
                 Species = dat[!duplicated(dat), 5])

ggplot(df, aes(x, y, colour = Species)) +
  geom_point()

假设我们有每个主题都是一列的数据:

Suppose we have data where every subject is a column:

dat <- t(mtcars)

唯一的额外步骤是在将数据输入tSNE/UMAP之前转置数据,然后在绘图数据中复制列名:

The only extra steps would be to transpose the data before feeding it to tSNE/UMAP and then copying the column names in the plotting data:

tsne <- Rtsne(t(dat), perplexity = 5) # got warning perplexity is too large

df <- data.frame(x = tsne$Y[,1],
                 y = tsne$Y[,2],
                 car = colnames(dat))

ggplot(df, aes(x, y, colour = car)) +
  geom_point()

这篇关于有什么方法可以为数据表绘制UMAP或t-SNE图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆