数据分区的模式识别 [英] pattern recognition for data partition

查看:94
本文介绍了数据分区的模式识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有问题,我们有一系列数据,请检查下图.基本上是24小时的数据,我们希望根据相似的值将24小时自动分成几个部分.

例如,我们可以看到从0am到7am的值大致相同,我们将该时段标记为group1,在7am到17之间我们有一个(大致)相似,然后将其设为2组,将17到24设为第3组.

如果我们输入一系列数据(x,y),并定义应该有多个分区(例如3),那么结果应该是这样的:

群组1:0am-7am
第2组:上午7点-17
第3组:16-24


图表在这里:

http://pot9tw.sn2.livefilestore.com/y1pvO0HSvBYGu22JKakg4XJSys_zM1H9qohg7XqDwxxSOgz1orlRNjJhLKCOYyvZ9RZqHyyUHErZFQbgaNvUHuZ4FRIIOTlFuHV/graph.PNG?psid=1 [解决方案

这称为" 集群分析 ",您的问题可能是最基本的问题. > 我不知道您是否有机会找到拥有一些随时可用的代码或算法的人,但是您可以从本文开始,只是了解并了解其中涉及的内容:

http://en.wikipedia.org/wiki/Cluster_analysis [ 使用系统; 使用 System.Collections.Generic; 使用 System.Text; 命名空间 ConsoleApplication612 { class 程序 { 私有 ValueSort:IComparer< int [] > { 公共 int Compare( int [] x , int [] y) { 如果(x [ 1 ] >> y [ 1 ]) 返回 -1; 其他 如果(x [ 1 ] < y [ 1 ]) 返回 1 ; 其他 返回 0 ; } } 私有 IndexSort:IComparer< int [] > { 公共 int Compare( int [] x , int [] y) { 如果(x [ 0 ] ">> y [ 0 ]) 返回 1 ; 其他 如果(x [ 0 ] < y [ 0 ]) 返回 -1; 其他 返回 0 ; } } 静态 无效 Main2(字符串 []参数) { int []数据= int [ ] { 1 2 1 3 2 4 5 11 10 12 10 12 25 24 4 2 3 }; int [] [] delta = int [data.Length] []; delta [ 0 ] = int [ 2 ]; delta [ 0 ] [ 0 ] = 0 ; delta [ 0 ] [ 1 ] = 0 ; for ( int i = 0 ; i < 数据.长度- 1 ; i ++) { delta [i + 1 ] = int [ 2 ]; delta [i + 1 ] [ 0 ] = i + 1 ; delta [i + 1 ] [ 1 ] = Math.Abs​​(data [i + 1 ]-数据[i]); } Array.Sort(delta, ValueSort()); int groups = 4 ; int [] [] topDelta = int [组- 1 ] []; for ( int i = 0 ; i < topDelta.Length; i ++) { topDelta [i] = delta [i]; } Array.Sort(topDelta, IndexSort()); List< List< int>>输出= List< List< int>>(); int index = 0 ; for ( int i = 0 ; i < topDelta.Length; i ++) { List< int> = 列表< int>(); for ( int j =索引; j >< topDelta [i] [ 0 ]; j ++) { .Add(data [j]); } index = topDelta [i] [ 0 ] + 1 ; output.Add( group ); } List< int> group2 = 列表< int>(); for ( int j = topDelta [topDelta.Length- 1 ] [ 0 ]; j < data.Length; j ++) { group2.Add(data [j]); } output.Add(group2); for ( int i = 0 ; i < output.Count; i ++) { Console.Write(" + i + " :"); foreach ( int + " ); } Console.WriteLine(); } } } }


这里是一个想法:
使用
保龄球乐队 [http://pot9tw.sn2.livefilestore.com/y1pvO0HSvBYGu22JKakg4XJSys_zM1H9qohg7XqDwxxSOgz1orlRNjJhLKCOYyvZ9RZqHyyUHErZFQbgaNvUHuZ4FRIIOTlFuHV/graph.PNG?psid=1[^]

解决方案

This is called "Cluster analysis", your problem is probably the most elementary one.
I don''t know your chances to find anyone who has some ready-to-use code or algorithm, but you can start with this article, just to get and idea what''s involved:

http://en.wikipedia.org/wiki/Cluster_analysis[^].

The article is very clear and provides very good references you may want to follow. It least you''ll learn proper terminology which will help you in your search of relevant approaches and algorithms. I actually just took a close look myself: the algorithms are easy enough to code them from scratch; what you need is to compare them and decide which one is the best for your purpose; I think this is more difficult, will require some research.

Good luck,

—SA


Hi, I implemented the cluster by my own method. the logic is that it will first try to find out top n delta (differences), then split the data according to the delta.

using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication612
{
    class Program
    {
        private class ValueSort : IComparer<int[]>
        {
            public int Compare(int[] x, int[] y)
            {
                if (x[1] > y[1])
                    return -1;
                else if (x[1] < y[1])
                    return 1;
                else
                    return 0;
            }
        }
        private class IndexSort : IComparer<int[]>
        {
            public int Compare(int[] x, int[] y)
            {
                if (x[0] > y[0])
                    return 1;
                else if (x[0] < y[0])
                    return -1;
                else
                    return 0;
            }
        }
        static void Main2(string[] args)
        {
            int[] data = new int[] { 1, 2, 1, 3, 2, 4, 5, 11, 10, 12, 10, 12, 13, 25, 21, 24, 20, 4, 5, 2, 3 };
            int[][] delta = new int[data.Length][];
            delta[0] = new int[2];
            delta[0][0] = 0;
            delta[0][1] = 0;
            for (int i = 0; i < data.Length - 1; i++)
            {
                delta[i + 1] = new int[2];
                delta[i + 1][0] = i + 1;
                delta[i + 1][1] = Math.Abs(data[i + 1] - data[i]);
            }
            Array.Sort(delta, new ValueSort());
            int groups = 4;
            int[][] topDelta = new int[groups - 1][];
            for (int i = 0; i < topDelta.Length; i++)
            {
                topDelta[i] = delta[i];
            }
            Array.Sort(topDelta, new IndexSort());
            List<List<int>> output = new List<List<int>>();
            int index = 0;
            for (int i = 0; i < topDelta.Length; i++)
            {
                List<int> group = new List<int>();
                for (int j = index; j < topDelta[i][0]; j++)
                {
                    group.Add(data[j]);
                }
                index = topDelta[i][0] + 1;
                output.Add(group);
            }
            List<int> group2 = new List<int>();
            for (int j = topDelta[topDelta.Length - 1][0]; j < data.Length; j++)
            {
                group2.Add(data[j]);
            }
            output.Add(group2);
            for (int i = 0; i < output.Count; i++)
            {
                Console.Write("Group" + i + ":");
                foreach (int value in output[i])
                {
                    Console.Write(value + ",");
                }
                Console.WriteLine();
            }
        }
    }
}


Here is a thought:
Use Bollinger Bands[^]

Calibrate to fit your needs and monitor the rate of change for the moving average. This should enable you to detect some of the interesting changes to your data. Comparing the input values against +/- the standard deviation multiplied by a calibrated value, indicates other points of interest. While bollinger bands primarily are associated with visual interpretation of financial data - there is no reason not to take advantage of the method computationally :)

Regards
Espen Harlinn


这篇关于数据分区的模式识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆