如何解释聚类结果? [英] How to explain clustering results?

查看:160
本文介绍了如何解释聚类结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个高维数据集,我认为它可以通过某种聚类算法很好地分离.然后我运行该算法,最后得到我的集群.

Say I have a high dimensional dataset which I assume to be well separable by some kind of clustering algorithm. And I run the algorithm and end up with my clusters.

是否存在某种方式(最好不是"hacky"或某种启发式)来解释哪些特征和阈值对于使集群A的成员(例如)成为集群A的一部分很重要?"

Is there any sort of way (preferable not "hacky" or some kind of heuristic) to explain "what features and thresholds were important in making members of cluster A (for example) part of cluster A?"

我尝试查看聚类质心,但是对于高维数据集来说这很繁琐.

I have tried looking at cluster centroids but this gets tedious with a high dimensional dataset.

我还尝试将决策树拟合到我的集群,然后查看树以确定给定集群的大多数成员遵循的决策路径.我还尝试过将SVM拟合到我的集群,然后在离质心最近的样本上使用LIME,以了解哪些特征对质心附近的分类很重要.

I have also tried fitting a decision tree to my clusters and then looking at the tree to determine which decision path most of the members of a given cluster follow. I have also tried fitting an SVM to my clusters and then using LIME on the closest samples to the centroids in order to get an idea of what features were important in classifying near the centroids.

但是,后两种方式都需要在无人监督的情况下使用有监督的学习,并且让我感到"hacky",而我希望有更多的基础.

However, both of these latter 2 ways require the use of supervised learning in an unsupervised setting and feel "hacky" to me, whereas I'd like something more grounded.

推荐答案

请勿将聚类算法视为黑匣子.

Do not treat the clustering algorithm as a black box.

是的,k均值使用质心.但是大多数用于高维数据的算法都不使用(也不使用k均值!).取而代之的是,他们通常会选择一些特征,投影,子空间,流形等.因此,请看一下实际的聚类算法可以提供哪些信息!

Yes, k-means uses centroids. But most algorithms for high-dimensional data don't (and don't use k-means!). Instead, they will often select some features, projections, subspaces, manifolds, etc. So look at what information the actual clustering algorithm provides!

这篇关于如何解释聚类结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆