k表示聚类算法 [英] k means clustering algorithm

查看:96
本文介绍了k表示聚类算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对一组10个数据点执行k均值聚类分析,每个数据点都具有与之关联的4个数值数组.我正在使用Pearson相关系数作为距离度量.我做了k均值聚类算法的前两个步骤:

I want to perform a k means clustering analysis on a set of 10 data points that each have an array of 4 numeric values associated with them. I'm using the Pearson correlation coefficient as the distance metric. I did the first two steps of the k means clustering algorithm which were:

1)选择一组k个聚类的初始中心. [我随机选择了两个初始中心]

1) Select a set of initial centres of k clusters. [I selected two initial centres at random]

2)将每个对象分配给具有最近中心的聚类. [我使用Pearson相关系数作为距离度量标准-见下文]

2) Assign each object to the cluster with the closest centre. [I used the Pearson correlation coefficient as the distance metric -- See below]

现在我需要帮助来了解算法的第三步:

Now I need help understanding the 3rd step in the algorithm:

3)计算群集的新中心:

3) Compute the new centres of the clusters:

在这种情况下,X是4维向量,n是簇中数据点的数量.

where X, in this case is a 4 dimensional vector and n is the number of data points in the cluster.

如何计算以下数据的C(S)?

How would I go about calculating C(S) for say the following data?

# Cluster 1
A   10  15  20  25  # randomly chosen centre
B   21  33  21  23
C   43  14  23  23
D   37  45  43  49
E   40  43  32  32

# Cluster 2
F  100  102 143 212 #random chosen centre
G  303  213 212 302
H  102  329 203 212
I  32   201 430 48
J  60   99  87  34

k表示算法的最后一步是重复第2步和第3步,直到没有对象改变簇为止.

The last step of the k means algorithm is to repeat step 2 and 3 until no object changes cluster which is simple enough.

我需要有关步骤3的帮助.计算群集的新中心.如果有人可以讲解并解释如何计算仅一个集群的新中心,那将对我有极大的帮助.

I need help with step 3. Computing the new centres of the clusters. If someone could go through and explain how to compute the new centre of just one of the clusters, that would help me immensely.

推荐答案

步骤3对应于为每个群集计算平均值. 对于群集1,您将得到一个新的群集中心(B+C+D+E) / 4,即(35.25 33.75 29.75 21.75),即将群集中所有点的每个分量分别求和,然后除以群集中的点数.

Step 3 corresponds to calculating the mean for each cluster. For cluster 1, you'd get as new cluster center (B+C+D+E) / 4, which is (35.25 33.75 29.75 21.75), i.e sum each component for all the points in the cluster separately, and divide it by the number of points in the cluster.

群集中心(群集1的A)通常不是新群集中心计算的一部分.

The cluster center (A for cluster 1) is usually not part of the calculation of the new cluster center.

这篇关于k表示聚类算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆