如何使用nltk(Python)获得K均值簇的单个质心 [英] How do I obtain individual centroids of K mean cluster using nltk (python)

查看:527
本文介绍了如何使用nltk(Python)获得K均值簇的单个质心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已使用nltk进行k个均值聚类,因为我想将距离度量更改为余弦距离.但是,如何获得所有聚类的质心?

I have used nltk to perform k mean clustering as I would like to change the distance metrics to cosine distance. However, how do I obtain the centroids of all the clusters?

kclusterer = KMeansClusterer(8, distance = nltk.cluster.util.cosine_distance, repeats = 1)
predict = kclusterer.cluster(features, assign_clusters = True)
centroids = kclusterer._centroid
df_clustering['cluster'] = predict
#df_clustering['centroid'] = centroids[df_clustering['cluster'] - 1].tolist()
df_clustering['centroid'] = centroids

我正在尝试对pandas数据框执行k均值聚类,并希望每个数据点的聚类的质心的坐标位于数据框列'centroid'中.

I am trying to perform the k mean clustering on a pandas dataframe, and would like to have the coordinates of the centroid of the cluster of each data point to be in the dataframe column 'centroid'.

提前谢谢!

推荐答案

import pandas as pd
import numpy as np

# created dummy dataframe with 3 feature
df = pd.DataFrame([[1,2,3],[50, 51,52],[2.0,6.0,8.5],[50.11,53.78,52]], columns = ['feature1', 'feature2','feature3'])
print(df)

obj = KMeansClusterer(2, distance = nltk.cluster.util.cosine_distance) #giving number of cluster 2
vectors = [np.array(f) for f in df.values]

df['predicted_cluster'] = obj.cluster(vectors,assign_clusters = True))

print(obj.means())
#OP
[array([50.055, 52.39 , 52.   ]), array([1.5 , 4.  , 5.75])] #which is going to be mean of three feature for 2 cluster, since number of cluster that we passed is 2

 #now if u want the cluster center in pandas dataframe 
 df['centroid'] = df['predicted_cluster'].apply(lambda x: obj.means()[x])

这篇关于如何使用nltk(Python)获得K均值簇的单个质心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆