使用python聚类后的有序彩色图 [英] Ordered colored plot after clustering using python

查看:306
本文介绍了使用python聚类后的有序彩色图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个称为data = [5 1 100 102 3 4 999 1001 5 1 2 150 180 175 898 1012]的一维数组。我正在使用python scipy.cluster.vq在其中查找群集。数据中有3个簇。当我尝试绘制数据进行聚类后,其中没有顺序。

I have a 1D array called data=[5 1 100 102 3 4 999 1001 5 1 2 150 180 175 898 1012]. I am using python scipy.cluster.vq to find clusters within it. There are 3 clusters in the data. After clustering when I'm trying to plot the data, there is no order in it.

如果可以按照给出的相同顺序绘制数据,并且为不同的部分分配不同的颜色或颜色,则效果很好。

It would be great if it's possible to plot the data in the same order as it is given and color different sections belong to different groups or clusters.

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, vq


data = np.loadtxt('rawdata.csv', delimiter=' ')
#----------------------kmeans------------------
centroid,_ = kmeans(data, 3) 
idx,_ = vq(data, centroid)
x=np.linspace(0,(len(data)-1),len(data))

fig = plt.figure(1)
plt.plot(x,data)
plot1=plt.plot(data[idx==0],'ob')
plot2=plt.plot(data[idx==1],'or')
plot3=plt.plot(data[idx==2],'og')
plt.show()






这是我的地块
http://s29.postimg.org/9gf7noe93/figure_1.png
(背景中的蓝色图按顺序排列,聚类后混乱了)


Here is my plot http://s29.postimg.org/9gf7noe93/figure_1.png (The blue graph in the background is in-order, after clustering,it messed up)

谢谢!

更新:

我编写了以下代码来实现有序col

I wrote the following code to implement in-order colored plot after clustering,

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, vq

data = np.loadtxt('rawdata.csv', delimiter=' ')
#----------------------kmeans-----------------------------
centroid,_ = kmeans(data, 3)  # three clusters
idx,_ = vq(data, centroid)
x=np.linspace(0,(len(data)-1),len(data))
fig = plt.figure(1)
plt.plot(x,data)

for i in range(0,(len(data)-1)):
    if data[i] in data[idx==0]:
       plt.plot(x[i],(data[i]),'ob' )
    if data[i] in data[idx==1]:
       plt.plot(x[i],(data[i]),'or' )
    if data[i] in data[idx==2]:
       plt.plot(x[i],(data[i]),'og' )
 plt.show()

上面的代码的问题是它太慢了。而且我的阵列大小超过300万。因此,这段代码将永远需要我完成。
我真的很感谢有人可以提供上述代码的矢量化版本
谢谢!

The problem with the above code is it's too slow. And my array size is over 3million. So this code will take forever to finish it's job for me. I really appreciate if someone can provide vectorized version of the above mentioned code. Thanks!

推荐答案

您可以根据聚类数据点到聚类中心的距离来绘制聚类数据点,然后编写每个数据点的索引都接近该值,以便根据它们的聚类属性查看它们如何分散:

You can plot the clustered data points based on their distances from the cluster center and then write the index of each data point close to that in order to see how they scattered based on their clustering properties:

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, vq
from scipy.spatial.distance import cdist
data=np.array([   5,    1,  100,  102,    3,    4,  999, 1001,    5,    1,    2,    150,  180,  175,  898, 1012])
centroid,_ = kmeans(data, 3) 
idx,_ = vq(data, centroid)
X=data.reshape(len(data),1)
Y=centroid.reshape(len(centroid),1)
D_k = cdist( X, Y, metric='euclidean' )
colors = ['red', 'green', 'blue']
pId=range(0,(len(data)-1))
cIdx = [np.argmin(D) for D in D_k]
dist = [np.min(D) for D in D_k]
r=np.vstack((data,dist)).T
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
mark=['^','o','>']
for i, ((x,y), kls) in enumerate(zip(r, cIdx)):
    ax.plot(r[i,0],r[i,1],color=colors[kls],marker=mark[kls])
    ax.annotate(str(i), xy=(x,y), xytext=(0.5,0.5), textcoords='offset points',
                 size=8,color=colors[kls])


ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('Data')
ax.set_ylabel('Distance')
plt.show()

更新

对于使用向量化过程非常热衷,您可以对随机生成的数据执行以下操作:

if you are very keen of using vectorize procedure you can do it as following for a randomly generated data:

data=np.random.uniform(1,1000,3000)
@np.vectorize
def plotting(i):
    ax.plot(i,data[i],color=colors[cIdx[i]],marker=mark[cIdx[i]])


mark=['>','o','^']
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
plotting(range(len(data)))
ax.set_xlabel('index')
ax.set_ylabel('Data')
plt.show()

这篇关于使用python聚类后的有序彩色图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆