如何使用Python查找我的数据属于哪个群集? [英] How do I find which cluster my data belongs to using Python?
问题描述
我只运行了PCA,然后对数据运行了K-means聚类算法,运行该算法后,我得到了3个聚类.我试图弄清楚我的输入属于哪个群集,以便收集有关输入的一些定性属性.我的输入是客户ID,用于聚类的变量是某些产品的支出模式
I just ran PCA and then K-means Clustering algorithm on my data, after running the algorithm I get 3 clusters. I am trying to figure out which clusters my input belongs to , in order to gather some qualitative attributes about the input. My input is customer ID and the variables I used for clustering were the spend patterns on certain products
下面是我为K运行的代码,寻找一些有关如何将此映射回源数据以查看输入属于哪个集群的输入:
Below is the code I ran for K means, looking for some inputs on how to map this back to the source data to see which cluster the input belongs to :
kmeans= KMeans(n_clusters=3)
X_clustered=kmeans.fit_predict(x_10d)
LABEL_COLOR_MAP = {0:'r', 1 : 'g' ,2 : 'b'}
label_color=[LABEL_COLOR_MAP[l] for l in X_clustered]
#plot the scatter diagram
plt.figure(figsize=(7,7))
plt.scatter(x_10d[:,0],x_10d[:,2] , c=label_color, alpha=0.5)
plt.show()
谢谢
推荐答案
如果要将群集标签重新添加到数据框中,并假设x_10d是数据框,则可以执行以下操作:
If you want to add the cluster labels back in your dataframe, and assuming x_10d is your dataframe, you can do:
x_10d ["cluster"] = X_clustered
x_10d["cluster"] = X_clustered
这将在您的数据框中添加一个名为集群"的新列,其中应包含每一行的集群标签.
This will add a new column in your dataframe called "cluster" which should contain the cluster label for each of your rows.
这篇关于如何使用Python查找我的数据属于哪个群集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!