(R)使用PCA(ggbiplot)可视化包含大量变量的数据集 [英] (R) Visualizing a data set with large number of variables using PCA (ggbiplot)

查看：88 发布时间：2021/5/10 19:59:10 r ggplot2 pca

本文介绍了(R)使用PCA(ggbiplot)可视化包含大量变量的数据集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据集有100个样本和17000个变量.我将使用PCA并可视化数据.但是问题是情节不好.如何选择 ggbiplot 或 biplot 中的箭头数量，实际上选择贡献最大的变量?一些示例代码如下:

 数据<-矩阵(rnorm(1700000)，nrow = 100，ncol = 17000)colnames(data)<-paste("X"，1:ncol(data)，sep =")pca<-prcomp(数据，比例= T，中心= T)双图(pca)打印(ggbiplot(pca，obs.scale = 1，var.scale = 1，组= c(rep('a'，30)，rep('b'，70))))

解决方案

我假设您从github获得了ggbiplot的最新版本(2015年6月19日

My dataset has 100 samples and 17000 variables. I would use PCA and visualize data. But the problem is that the plot is not good. How I can control the number of arrows in ggbiplot or biplot, in fact select the most contributed variables? Some sample codes are as below:

data <- matrix(rnorm(1700000), nrow=100, ncol=17000)
colnames(data) <- paste("X", 1:ncol(data), sep="")
pca <- prcomp(data, scale=T, center=T)

biplot(pca)
print(ggbiplot(pca, obs.scale = 1, var.scale = 1, 
               groups = c(rep('a',30), rep('b',70))))

解决方案

I assumed you got a recent version of ggbiplot from github (19 Jun 2015 https://github.com/vqv/ggbiplot). In this one, I don't think there's a clean way to reduce the number of arrows. You'd have to modify the original function by subsetting the df.v in two plotting calls:

around line 89:

g <- g + geom_segment(data = df.v[1:5,], # SUBSET HERE
aes(x = 0, y = 0, xend = xvar, yend = yvar), arrow = arrow(length = unit(1/2, "picas")), color = muted("red"))

and around line 127:

g <- g + geom_text(data = df.v[1:5,], # SUBSET HERE
aes(label = varname, x = xvar, y = yvar, angle = angle, hjust = hjust), color ="darkred", size = varname.size)

这篇关于(R)使用PCA(ggbiplot)可视化包含大量变量的数据集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

(R)使用PCA(ggbiplot)可视化包含大量变量的数据集 [英] (R) Visualizing a data set with large number of variables using PCA (ggbiplot)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

(R)使用PCA(ggbiplot)可视化包含大量变量的数据集 [英] (R) Visualizing a data set with large number of variables using PCA (ggbiplot)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭