稀疏矩阵作为R中层次聚类的输入 [英] Sparse Matrix as input to Hierarchical clustering in R

查看:117
本文介绍了稀疏矩阵作为R中层次聚类的输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于使用距离矩阵进行聚类但稀疏的问题.

I have a question about clustering using a distance matrix, but sparse.

是否存在一种稀疏距离对象格式,该格式不会扩展矩阵并且可以使用稀疏表示形式?

Is there a sparse distance object format that does not expand the matrix and can work with the sparse representation?

当前我正在执行以下操作

Currently I'm doing the following

# read sparse matrix
sparse <- readMM('sparse-matrix')
distance <- as.dist(sparse)

sparse-matrix已经是正确的距离矩阵,对于未连接的条目,它具有NA.

sparse-matrix is already the correct distance matrix, which has NA's for entries that are not connected.

>sparse
[1,] . . .
[2,] 1 . .
[3,] 1 . .

> as.dist(sparse)
1 2
2 1  
3 1 0

但是用as.dist转换失败

But converting it with as.dist fails with

asMethod(object)中的错误:不允许负长度向量

Error in asMethod(object) : negative length vectors are not allowed

大概是因为它将矩阵扩展为完整的形式.矩阵(NxN)的大小为N = 49281 需要这种格式(dist对象),例如通过hclust方法

Presumably, because it expands the matrix to a complete form. The matrix (NxN) size is N = 49281 This format is needed (dist object) by for example the hclust method

类似的问题,没有任何答案 R帮助列表上的

Similar Question without any answer on the R help list

推荐答案

距离矩阵如何稀疏?每个两个对象之间都有一个距离,因此它实际上是一个非常密集的矩阵.但是,三角矩阵足以描述相互距离(如 D = D ').实际上,由dist生成的对象就是这种情况.

How would a distance matrix be sparse? There is a distance between each two objects, so it is actually a very dense matrix. However, a triangular matrix is sufficient to describe the mutual distances (as D = D'). This is actually the case for the objects produced by dist.

如果由于许多对象相同而使距离矩阵稀疏,那么您可能只想对唯一的对象计算距离矩阵.

If the distance matrix is sparse because lots of objects are the same, then maybe you'd want to calculate the distance matrix only on unique objects.

这篇关于稀疏矩阵作为R中层次聚类的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆