在非常大的稀疏矩阵上应用PCA [英] Apply PCA on very large sparse matrix

查看：467 发布时间：2020/4/27 3:37:03 language-agnostic machine-learning sparse-matrix pca

本文介绍了在非常大的稀疏矩阵上应用PCA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用R进行文本分类任务，我得到了一个文档项矩阵，其大小为22490 x 120,000(只有400万个非零条目，少于1％的条目).现在，我想通过使用PCA(主成分分析)来降低尺寸.不幸的是，R无法处理这个庞大的矩阵，因此我将这个稀疏矩阵存储在矩阵市场格式"的文件中，希望使用其他技术来进行PCA.

I am doing a text classification task with R, and I obtain a document-term matrix with size 22490 by 120,000 (only 4 million non-zero entries, less than 1% entries). Now I want to reduce the dimensionality by utilizing PCA (Principal Component Analysis). Unfortunately, R cannot handle this huge matrix, so I store this sparse matrix in a file in the "Matrix Market Format", hoping to use some other techniques to do PCA.

所以任何人都可以给我一些有用的库的提示(无论使用哪种编程语言)，这些库可以轻松地使用这种大规模矩阵进行PCA，或者由我自己进行长期的PCA，换句话说，就是 首先计算协方差矩阵，然后计算协方差矩阵的特征值和特征向量 .

So could anyone give me some hints for useful libraries (whatever the programming language), which could do PCA with this large-scale matrix with ease, or do a longhand PCA by myself, in other words, calculate the covariance matrix at first, and then calculate the eigenvalues and eigenvectors for the covariance matrix.

我想要的是 计算所有PC(120,000)，并仅选择占90％差异的前N个PC .显然，在这种情况下，我必须给先验阈值以将一些非常小的方差值设置为0(在协方差矩阵中)，否则，协方差矩阵将不会稀疏，其大小将为120,000 x 120,000，即一台机器无法处理.同样，载荷(特征向量)将非常大，应以稀疏格式存储.

What I want is to calculate all PCs (120,000), and choose only the top N PCs, who accounts for 90% variance. Obviously, in this case, I have to give a threshold a priori to set some very tiny variance values to 0 (in the covariance matrix), otherwise, the covariance matrix will not be sparse and its size would be 120,000 by 120,000, which is impossible to handle with one single machine. Also, the loadings (eigenvectors) will be extremely large, and should be stored in sparse format.

非常感谢您的帮助！

注意:我正在使用一台具有24GB RAM和8个CPU内核的计算机.

Note: I am using a machine with 24GB RAM and 8 cpu cores.

在非常大的稀疏矩阵上应用PCA [英] Apply PCA on very large sparse matrix

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在非常大的稀疏矩阵上应用PCA [英] Apply PCA on very large sparse matrix

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭