您如何在python数组中对这三个区域进行分组/集群? [英] How would you group/cluster these three areas in arrays in python?

查看:263
本文介绍了您如何在python数组中对这三个区域进行分组/集群?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以你有一个数组

1
2
3
60
70
80
100
220
230
250

为了更好地理解:

您将如何对python(v2.6)数组中的三个区域进行分组/群集,因此在这种情况下,您将获得三个包含

How would you group/cluster the three areas in arrays in python(v2.6), so you get three arrays in this case containing


[1 2 3] [60 70 80 100] [220230250]

[1 2 3] [60 70 80 100] [220 230 250]

背景:

y轴是频率,x轴是数字。这些数字是由其频率表示的十个最高振幅。我想从它们创建三个离散数字以进行模式识别。可能还有更多的点,但是所有这些点都按相对较大的频率差进行分组,如在本示例中所看到的,大约在50到0之间,在100到220之间。

y-axis is frequency, x-axis is number. These numbers are the ten highest amplitudes being represented by their frequencies. I want to create three discrete numbers from them for pattern recognition. There could be many more points but all of them are grouped by a relatively big frequency difference as you can see in this example between about 50 and about 0 and between about 100 and about 220. Note that what is big and what is small changes but the difference between clusters remains significant compared to the difference between elements of a group/cluster.

推荐答案

这是在以下情况下实现的简单算法: python 检查值是否与簇的均值相差太远(以标准差计)

This is a simple algorithm implemented in python that check whether or not a value is too far (in terms of standard deviation) from the mean of a cluster:

from math import sqrt

def stat(lst):
    """Calculate mean and std deviation from the input list."""
    n = float(len(lst))
    mean = sum(lst) / n
    stdev = sqrt((sum(x*x for x in lst) / n) - (mean * mean)) 
    return mean, stdev

def parse(lst, n):
    cluster = []
    for i in lst:
        if len(cluster) <= 1:    # the first two values are going directly in
            cluster.append(i)
            continue

        mean,stdev = stat(cluster)
        if abs(mean - i) > n * stdev:    # check the "distance"
            yield cluster
            cluster[:] = []    # reset cluster to the empty list

        cluster.append(i)
    yield cluster           # yield the last cluster

这将返回示例中您期望的 5< n < 9

This will return what you expect in your example with 5 < n < 9:

>>> array = [1, 2, 3, 60, 70, 80, 100, 220, 230, 250]
>>> for cluster in parse(array, 7):
...     print(cluster)
[1, 2, 3]
[60, 70, 80, 100]
[220, 230, 250]

这篇关于您如何在python数组中对这三个区域进行分组/集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆