稀疏矩阵作为R中层次聚类的输入 [英] Sparse Matrix as input to Hierarchical clustering in R
问题描述
我有一个关于使用距离矩阵进行聚类但稀疏的问题.
I have a question about clustering using a distance matrix, but sparse.
是否存在一种稀疏距离对象格式,该格式不会扩展矩阵并且可以使用稀疏表示形式?
Is there a sparse distance object format that does not expand the matrix and can work with the sparse representation?
当前我正在执行以下操作
Currently I'm doing the following
# read sparse matrix
sparse <- readMM('sparse-matrix')
distance <- as.dist(sparse)
sparse-matrix已经是正确的距离矩阵,对于未连接的条目,它具有NA.
sparse-matrix is already the correct distance matrix, which has NA's for entries that are not connected.
>sparse
[1,] . . .
[2,] 1 . .
[3,] 1 . .
> as.dist(sparse)
1 2
2 1
3 1 0
但是用as.dist转换失败
But converting it with as.dist fails with
asMethod(object)中的错误:不允许负长度向量
Error in asMethod(object) : negative length vectors are not allowed
大概是因为它将矩阵扩展为完整的形式.矩阵(NxN)的大小为N = 49281 需要这种格式(dist对象),例如通过hclust方法
Presumably, because it expands the matrix to a complete form. The matrix (NxN) size is N = 49281 This format is needed (dist object) by for example the hclust method
类似的问题,没有任何答案 R帮助列表上的
Similar Question without any answer on the R help list
推荐答案
距离矩阵如何稀疏?每个两个对象之间都有一个距离,因此它实际上是一个非常密集的矩阵.但是,三角矩阵足以描述相互距离(如 D = D ').实际上,由dist
生成的对象就是这种情况.
How would a distance matrix be sparse? There is a distance between each two objects, so it is actually a very dense matrix. However, a triangular matrix is sufficient to describe the mutual distances (as D = D'). This is actually the case for the objects produced by dist
.
如果由于许多对象相同而使距离矩阵稀疏,那么您可能只想对唯一的对象计算距离矩阵.
If the distance matrix is sparse because lots of objects are the same, then maybe you'd want to calculate the distance matrix only on unique objects.
这篇关于稀疏矩阵作为R中层次聚类的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!