如何使用Python将k-Means聚类标签设置为从最高到最低? [英] How to set k-Means clustering labels from highest to lowest with Python?

查看：193 发布时间：2020/4/26 10:22:02 python sorting numpy scikit-learn k-means

本文介绍了如何使用Python将k-Means聚类标签设置为从最高到最低?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含38套公寓及其早晨，下午和晚上的用电量的数据集.我正在尝试使用scikit-learn的k-Means实现对该数据集进行聚类，并获得一些有趣的结果.

I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, and am getting some interesting results.

第一个聚类结果:

这一切都很好，有了4个簇，我显然得到了与每个单元关联的4个标签-0、1、2和3.使用KMeans方法的random_state参数，我可以将种子固定在其中质心是随机初始化的，因此始终如一地，我得到了归因于相同公寓的相同标签.

This is all very well, and with 4 clusters I obviously get 4 labels associated to each apartment - 0, 1, 2 and 3. Using the random_state parameter of KMeans method, I can fix the seed in which the centroids are randomly initialized, so consistently I get the same labels attributed to the same apartments.

但是，由于这种特定情况涉及能源消耗，因此可以在最高和最低消耗者之间进行可测量的分类.因此，我想将标签0分配给最低消费水平的公寓，将标签1分配给消耗更多的公寓，依此类推.

However, as this specific case is in regards of energy consumption, a measurable classification between the highest and the lowest consumers can be performed. I would like, thus, to assign the label 0 to the apartments with lowest consumption level, label 1 to apartments that consume a bit more and so on.

到目前为止，我的标签是[2 1 3 0]或["black"，"green"，"blue"，"red"]；我希望它们是[0 1 2 3]或[红色"，绿色"，黑色"，蓝色"].在保持质心初始化随机(带有固定种子)的同时，我应该如何做?

As of now, my labels are [2 1 3 0], or ["black", "green", "blue", "red"]; I would like them to be [0 1 2 3] or ["red", "green", "black", "blue"]. How should I proceed to do so, while still keeping the centroid initialization random (with fixed seed)?

非常感谢您的帮助！

推荐答案

通过查找表转换标签是实现所需内容的一种直接方法.

Transforming the labels through a lookup table is a straightforward way to achieve what you want.

首先，我生成一些模拟数据:

To begin with I generate some mock data:

import numpy as np

np.random.seed(1000)

n = 38
X_morning = np.random.uniform(low=.02, high=.18, size=38)
X_afternoon = np.random.uniform(low=.05, high=.20, size=38)
X_night = np.random.uniform(low=.025, high=.175, size=38)
X = np.vstack([X_morning, X_afternoon, X_night]).T

然后我对数据执行聚类:

Then I perform clustering on data:

from sklearn.cluster import KMeans
k = 4
kmeans = KMeans(n_clusters=k, random_state=0).fit(X)

最后，我使用NumPy的 argsort 创建像这样的查找表:

And finally I use NumPy's argsort to create a lookup table like this:

idx = np.argsort(kmeans.cluster_centers_.sum(axis=1))
lut = np.zeros_like(idx)
lut[idx] = np.arange(k)

样品运行:

In [70]: kmeans.cluster_centers_.sum(axis=1)
Out[70]: array([ 0.3214523 ,  0.40877735,  0.26911353,  0.25234873])

In [71]: idx
Out[71]: array([3, 2, 0, 1], dtype=int64)

In [72]: lut
Out[72]: array([2, 3, 1, 0], dtype=int64)

In [73]: kmeans.labels_
Out[73]: array([1, 3, 1, ..., 0, 1, 0])

In [74]: lut[kmeans.labels_]
Out[74]: array([3, 0, 3, ..., 2, 3, 2], dtype=int64)

idx显示从最低到最高消耗级别排序的集群中心标签. lut[kmeans.labels_]为0/3的公寓属于最低/最高消费水平的集群.

idx shows the cluster center labels ordered from lowest to highest consumption level. The appartments for which lut[kmeans.labels_] is 0 / 3 belong to the cluster with the lowest / highest consumption levels.

这篇关于如何使用Python将k-Means聚类标签设置为从最高到最低?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Python将k-Means聚类标签设置为从最高到最低? [英] How to set k-Means clustering labels from highest to lowest with Python?

问题描述

推荐答案

样品运行:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用Python将k-Means聚类标签设置为从最高到最低? [英] How to set k-Means clustering labels from highest to lowest with Python?

问题描述

推荐答案

样品运行:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭