使用pvclust聚类一维数据 [英] cluster one-dimensional data using pvclust

查看:390
本文介绍了使用pvclust聚类一维数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢您抽出宝贵时间阅读此问题.我有一些一维数据要在R中聚类.基本的hclust命令可以正常工作.但是pvclust命令却不获取一维数据,并且一直在说:

Thanks for taking time read this question. I have some one-dimensional data to cluster in R. The basic hclust command works fine. But the pvclust command, however, does not take one-dimensional data, and keeps saying:

Error in hclust(distance, method = method.hclust) : 
  must have n >= 2 objects to cluster

我找到了一种解决方法,将一些全零行添加到了数据中.因此数据变为:

I found a work-around, that I added some all-zero rows to the data. So the data becomes:

       [,1]   [,2]   [,3]  [,4]  [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
[1,]  7.424 14.251 15.957 1.542 2.451 20.836 13.534 20.003 12.555 10.817
[2,]      0      0      0     0     0      0      0      0      0      0
[3,]      0      0      0     0     0      0      0      0      0      0
[4,]      0      0      0     0     0      0      0      0      0      0

然后我运行了pvclust,它成功了!

Then I ran pvclust, and it worked!

但是我担心这种变通方法会破坏pvclust背后的数学.谁能告诉我我是对还是错,是否可以更好地解决我的问题?

But I am concerned that this work-around screws up the mathematics laying behind pvclust. Can any one tell me whether I am right/wrong, and if there's a better solution to my question?

谢谢!

推荐答案

首先,让我指出这些方法均不适用于一维数据.

对于一维数据,请使用一种可以对数据进行排序的方法.例如,使用基于核密度估计的方法.

For one-dimensional data, please use a method that exploits that the data can be sorted. For example, use a method based on kernel density estimation.

集群分析"一词通常仅用于多维数据.在一个维度上,有更好的方法.另请参阅自然中断优化",但是恕我直言,您应该使用内核密度估计:在KDE中以局部最小值分割数据.

The term "cluster analysis" is usually used with multidimensional data only. In one dimensional, there are much better methods. See also "natural breaks optimization", but IMHO you should be using kernel density estimation: split the data at local minima in the KDE.

现在是您的实际问题.最可能的问题是您正在传递1维数据.这被解释为一个记录,维度为d ,因此该方法抱怨只包含一个样本.您可以通过首先移置您的记录来获得成功.

Now to your actual question. Most likely the problem is that you are ... passing 1 dimensional data. Which is interpreted as one record, with d dimensions, and thus the method complains about having a single sample only. You may have success by first transposing your record.

随着您添加零条记录的努力,结果很可能成为伪造的.您可能正在对一个数据集进行聚类,该数据集包含1个包含数据的矢量和3个全为零的矢量...

With your hack of adding zero records, the result most likely becomes bogus. You are probably clustering a data set that has 1 vector that contains your data, and 3 vectors that are all zero...

但是最后,您无论如何都不应该在这里使用这些方法!使用一种可以对您的数据进行排序的方法.

But in the end, you should not be using these methods here anyway! Use a method that exploits that your data can be sorted.

这篇关于使用pvclust聚类一维数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆