Python集群纯度指标 [英] Python Clustering 'purity' metric

查看：0 发布时间：2022/8/7 14:20:03 python scikit-learn cluster-analysis

本文介绍了Python集群纯度指标的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Gaussian Mixture Model (GMM)中的sklearn.mixture对我的数据集执行群集。

我可以使用函数score()来计算该模型下的对数概率。

但是，我正在寻找this article中定义的名为‘PURITY’的指标。

如何在Python中实现它？我当前的实现如下所示：

from sklearn.mixture import GMM

# X is a 1000 x 2 array (1000 samples of 2 coordinates).
# It is actually a 2 dimensional PCA projection of data
# extracted from the MNIST dataset, but this random array
# is equivalent as far as the code is concerned.
X = np.random.rand(1000, 2)

clusterer = GMM(3, 'diag')
clusterer.fit(X)
cluster_labels = clusterer.predict(X)

# Now I can count the labels for each cluster..
count0 = list(cluster_labels).count(0)
count1 = list(cluster_labels).count(1)
count2 = list(cluster_labels).count(2)

但我不能循环每个簇来计算混淆矩阵(根据这个question)

推荐答案

大卫的答案有效，但这里有另一种方法。

import numpy as np
from sklearn import metrics

def purity_score(y_true, y_pred):
    # compute contingency matrix (also called confusion matrix)
    contingency_matrix = metrics.cluster.contingency_matrix(y_true, y_pred)
    # return purity
    return np.sum(np.amax(contingency_matrix, axis=0)) / np.sum(contingency_matrix)

如果您还需要计算逆纯净度，您只需将"axis=0"替换为"axis=1"。

这篇关于Python集群纯度指标的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python集群纯度指标 [英] Python Clustering 'purity' metric

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python集群纯度指标 [英] Python Clustering &#39;purity&#39; metric

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python集群纯度指标 [英] Python Clustering 'purity' metric

登录关闭