通过值在python中的接近度来聚类(机器学习?) [英] Clustering values by their proximity in python (machine learning?)

查看:55
本文介绍了通过值在python中的接近度来聚类(机器学习?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一种在一组对象上运行的算法.该算法会产生一个分数值,该分数决定了集合中元素之间的差异.

I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set.

排序后的输出是这样的:

The sorted output is something like this:

[1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230]

[1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230]

如果将这些值放在电子表格中,则会看到它们组成了组

If you lay these values down on a spreadsheet you see that they make up groups

[1,1,5,6,1,5] [10,22,23,23] [50,51,51,52] [100,112,130] [500,512,600] [12000,12230]

[1,1,5,6,1,5] [10,22,23,23] [50,51,51,52] [100,112,130] [500,512,600] [12000,12230]

有没有办法以编程方式获取这些分组?

Is there a way to programatically get those groupings?

也许使用机器学习库的某些聚类算法?还是我想得太多?

Maybe some clustering algorithm using a machine learning library? Or am I overthinking this?

我看过scikit,但是对于我的问题来说,他们的例子太高级了……

I've looked at scikit but their examples are way too advanced for my problem...

推荐答案

不要对一维数据使用聚类

聚类算法设计用于多变量数据.如果您拥有一维数据,请对其进行排序,并查找最大空白.在1d中这是微不足道的,并且 fast ,而在2d中则不可能.如果需要更高级的功能,请使用内核密度估计(KDE)并查找局部最小值以拆分数据集.

Don't use clustering for 1-dimensional data

Clustering algorithms are designed for multivariate data. When you have 1-dimensional data, sort it, and look for the largest gaps. This is trivial and fast in 1d, and not possible in 2d. If you want something more advanced, use Kernel Density Estimation (KDE) and look for local minima to split the data set.

此问题有很多重复项:

  • 1D Number Array Clustering
  • Cluster one-dimensional data optimally?

这篇关于通过值在python中的接近度来聚类(机器学习?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆