通过它们在 python 中的接近度对值进行聚类(机器学习?) [英] Clustering values by their proximity in python (machine learning?)

查看:29
本文介绍了通过它们在 python 中的接近度对值进行聚类(机器学习?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在一组对象上运行的算法.该算法产生一个分值,指示集合中元素之间的差异.

I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set.

排序后的输出是这样的:

The sorted output is something like this:

[1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230]

[1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230]

如果您将这些值放在电子表格上,您会发现它们构成了组

If you lay these values down on a spreadsheet you see that they make up groups

[1,1,5,6,1,5] [10,22,23,23] [50,51,51,52] [100,112,130] [500,512,600] [12000,12230]

[1,1,5,6,1,5] [10,22,23,23] [50,51,51,52] [100,112,130] [500,512,600] [12000,12230]

有没有办法以编程方式获取这些分组?

Is there a way to programatically get those groupings?

也许是一些使用机器学习库的聚类算法?还是我想多了?

Maybe some clustering algorithm using a machine learning library? Or am I overthinking this?

我看过 scikit 但他们的例子对于我的问题来说太先进了......

I've looked at scikit but their examples are way too advanced for my problem...

推荐答案

不要对一维数据使用聚类

聚类算法专为多元数据而设计.当您拥有一维数据时,对其进行排序,并寻找最大的差距.这在 1d 中微不足道且快速,而在 2d 中是不可能的.如果您想要更高级的东西,请使用核密度估计 (KDE) 并寻找局部最小值来分割数据集.

Don't use clustering for 1-dimensional data

Clustering algorithms are designed for multivariate data. When you have 1-dimensional data, sort it, and look for the largest gaps. This is trivial and fast in 1d, and not possible in 2d. If you want something more advanced, use Kernel Density Estimation (KDE) and look for local minima to split the data set.

这个问题有很多重复:

这篇关于通过它们在 python 中的接近度对值进行聚类(机器学习?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆