对一维数据进行聚类并在 matplotlib 直方图上表示聚类 [英] clustering 1D data and representing clusters on matplotlib histogram

查看:152
本文介绍了对一维数据进行聚类并在 matplotlib 直方图上表示聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的一维数据:

areas = ...plt.figure(figsize=(10, 10))plt.hist(区域,bins = 80)plt.show()

这个情节看起来像这样:

现在我希望能够对这些数据进行聚类.我知道我可以选择

将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt从 sklearn.cluster 导入 KMeans将 matplotlib 导入为 mplmpl.rcParams['axes.spines.top'] = Falsempl.rcParams['axes.spines.right'] = False# 模拟一些假数据n = 10000mu1, sigma1 = 0, 1mu2, sigma2 = 6, 2a = mu1 + sigma1 * np.random.randn(n)b = mu2 + sigma2 * np.random.randn(n)数据 = np.concatenate([a, b])# 确定每个点属于哪个 K-Means 集群cluster_id = KMeans(2).fit_predict(data.reshape(-1, 1))# 通过聚类分配和绘图确定密度图, ax = plt.subplots()bins = np.linspace(data.min(), data.max(), 40)对于 np.unique(cluster_id) 中的 ii:子集=数据[cluster_id==ii]ax.hist(subset, bins=bins, alpha=0.5, label=fCluster {ii}")ax.legend()plt.show()

I have 1D data in the format of:

areas = ...
plt.figure(figsize=(10, 10))
plt.hist(areas, bins=80)
plt.show()

The plot of this looks something along the lines of this:

Now I want to be able to cluster this data. I know that I have the option of either Kernel Density Estimation or K-Means. But once I have these values, how am I represent this clusters on the histogram?

解决方案

You just need to figure out your cluster assignment, and then plot each subset of the data individually while taking care that the bins are the same each time.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

import matplotlib as mpl
mpl.rcParams['axes.spines.top'] = False
mpl.rcParams['axes.spines.right'] = False

# simulate some fake data
n = 10000
mu1, sigma1 = 0, 1
mu2, sigma2 = 6, 2
a = mu1 + sigma1 * np.random.randn(n)
b = mu2 + sigma2 * np.random.randn(n)
data = np.concatenate([a, b])

# determine which K-Means cluster each point belongs to
cluster_id = KMeans(2).fit_predict(data.reshape(-1, 1))

# determine densities by cluster assignment and plot
fig, ax = plt.subplots()
bins = np.linspace(data.min(), data.max(), 40)
for ii in np.unique(cluster_id):
    subset = data[cluster_id==ii]
    ax.hist(subset, bins=bins, alpha=0.5, label=f"Cluster {ii}")
ax.legend()
plt.show()

这篇关于对一维数据进行聚类并在 matplotlib 直方图上表示聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆