使用python生成数据集群? [英] Using python to generate clusters of data?

查看:156
本文介绍了使用python生成数据集群?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python函数,但我想在其中建模高斯分布,但是我仍然遇到了麻烦。

 以rnd 
的形式导入numpy.random以NP
的numpy形式导入numpy(co1,co2,M):
X = rnd.randn(2,2M + 1)
t = rnd.randn(1,2M +1)
numpy.concatenate(X,co1)
numpy.concatenate(X,co2)
return(X,t)

我正在尝试两个大小为M的簇,簇1的中心为co1,簇2的中心为二氧化碳X将返回我要绘制图形的数据点,而t是目标值(如果是簇1,则为1,如果是簇2,则为2),因此我可以按簇对其进行着色。

在这种情况下,t是2s的大小为2s的X,X是2M * 1的大小,如果X [i]在簇1中,则t [i]为1,而对于簇2,t [i]为1。



我认为开始执行此操作的最佳方法是使用numpys random生成数组数组。我感到困惑的是如何使其根据群集居中?






最好的方法是生成一个簇大小为M的簇,然后将co1添加到每个点?我如何使其随机,并确保t [i]正确着色?



我正在使用此函数来绘制数据图形:

  def graphData():
co1 =(0.5,-0.5)
co2 =(-0.5,0.5)
M = 1000
X,t = genData(co1,co2,M)
颜色= np.array(['r','b'])
plt.figure( )
plt.scatter(X [:, 0],X [:, 1],color = colors [t],s = 10)


解决方案

出于您的目的,我将使用 sklearn 示例生成器



此在这种情况下,对于您尝试实现的目标来说可能太多了,但总的来说,我认为最好还是依赖更通用且经过更好测试的库代码,这些代码也可以在其他情况下使用。


I'm working on a Python function, where I want to model a Gaussian distribution, I'm stuck though.

import numpy.random as rnd
import numpy as np

def genData(co1, co2, M):
  X = rnd.randn(2, 2M + 1)
  t = rnd.randn(1, 2M + 1)
  numpy.concatenate(X, co1)
  numpy.concatenate(X, co2)
  return(X, t)

I'm trying for two clusters of size M, cluster 1 is centered at co1, cluster 2 is centered at co2. X would return the data points I'm going to graph, and t are the target values (1 if cluster 1, 2 if cluster 2) so I can color it by cluster.

In that case, t is size 2M of 1s/2s and X is size 2M * 1, wherein t[i] is 1 if X[i] is in cluster 1 and the same for cluster 2.

I figured the best way to start doing this is generating the array array using numpys random. What I'm confused about is how to get it centered according to the cluster?


Would the best way be to generate a cluster sized M, then add co1 to each of the points? How would I make it random though, and make sure t[i] is colored in properly?

I'm using this function to graph the data:

def graphData():
    co1 = (0.5, -0.5)
    co2 = (-0.5, 0.5)
    M = 1000
    X, t = genData(co1, co2, M)
    colors = np.array(['r', 'b'])
    plt.figure()
    plt.scatter(X[:, 0], X[:, 1], color = colors[t], s = 10)

解决方案

For your purpose, I would go for sklearn sample generator make_blobs:

from sklearn.datasets.samples_generator import make_blobs

centers = [(-5, -5), (5, 5)]
cluster_std = [0.8, 1]

X, y = make_blobs(n_samples=100, cluster_std=cluster_std, centers=centers, n_features=2, random_state=1)

plt.scatter(X[y == 0, 0], X[y == 0, 1], color="red", s=10, label="Cluster1")
plt.scatter(X[y == 1, 0], X[y == 1, 1], color="blue", s=10, label="Cluster2")

You can generate multi-dimensional clusters with this. X yields data points and y is determining which cluster a corresponding point in X belongs to.

This might be too much for what you try to achieve in this case, but generally, I think it's better to rely on more general and better-tested library codes that can be used in other cases as well.

这篇关于使用python生成数据集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆