如何在 sklearn 库的 k 均值聚类中使用轮廓分数? [英] How to use silhouette score in k-means clustering from sklearn library?

查看：30 发布时间：2021/12/25 14:55:45 python-2.7 machine-learning scikit-learn k-means silhouette

本文介绍了如何在 sklearn 库的 k 均值聚类中使用轮廓分数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在我的脚本中使用轮廓分数，从 sklearn 自动计算 k-means 聚类中的聚类数.

I'd like to use silhouette score in my script, to automatically compute number of clusters in k-means clustering from sklearn.

import numpy as np
import pandas as pd
import csv
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

filename = "CSV_BIG.csv"

# Read the CSV file with the Pandas lib.
path_dir = ".\"
dataframe = pd.read_csv(path_dir + filename, encoding = "utf-8", sep = ';' ) # "ISO-8859-1")
df = dataframe.copy(deep=True)

#Use silhouette score
range_n_clusters = list (range(2,10))
print ("Number of clusters from 2 to 9: 
", range_n_clusters)

for n_clusters in range_n_clusters:
    clusterer = KMeans (n_clusters=n_clusters).fit(?)
    preds = clusterer.predict(?)
    centers = clusterer.cluster_centers_

    score = silhouette_score (?, preds, metric='euclidean')
    print ("For n_clusters = {}, silhouette score is {})".format(n_clusters, score)

有人可以帮我打问号吗?我不明白用什么来代替问号.我从一个例子中获取了代码.注释的部分是之前的版本，这里我做k-means聚类，簇数固定为4.这种方式的代码是正确的，但是在我的项目中我需要自动选择簇数.

Someone can help me with question marks? I don't understand what to put instead of question marks. I have taken the code from an example. The commented part is the previous versione, where I do k-means clustering with a fixed number of clusters set to 4. The code in this way is correct, but in my project I need to automatically chose the number of clusters.

推荐答案

我假设您要通过剪影得分来获得最佳编号.的集群.

I am assuming you are going to silhouette score to get the optimal no. of clusters.

首先声明一个 KMeans 的单独对象，然后像这样在你的数据 df 上调用它的 fit_predict 函数

First declare a seperate object of KMeans and then call it's fit_predict functions over your data df like this

for n_clusters in range_n_clusters:
    clusterer = KMeans(n_clusters=n_clusters)
    preds = clusterer.fit_predict(df)
    centers = clusterer.cluster_centers_

    score = silhouette_score(df, preds)
    print("For n_clusters = {}, silhouette score is {})".format(n_clusters, score))

参见这个官方示例更清晰.

这篇关于如何在 sklearn 库的 k 均值聚类中使用轮廓分数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 sklearn 库的 k 均值聚类中使用轮廓分数? [英] How to use silhouette score in k-means clustering from sklearn library?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何在 sklearn 库的 k 均值聚类中使用轮廓分数? [英] How to use silhouette score in k-means clustering from sklearn library?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭