python中的时间序列分割 [英] time-series segmentation in python

查看:65
本文介绍了python中的时间序列分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对时间序列数据进行分段,如图所示.我有很多来自传感器的数据,这些数据中的任何一个都可以有不同数量的孤立峰区域.在这个图中,我有 3 个.我想要一个函数,它将时间序列作为输入并返回等长的分段部分.

I am trying to segment the time-series data as shown in the figure. I have lots of data from the sensors, any of these data can have different number of isolated peaks region. In this figure, I have 3 of those. I would like to have a function that takes the time-series as the input and returns the segmented sections of equal length.

我最初的想法是有一个滑动窗口来计算振幅的相对变化.由于具有峰值的窗口将具有相对较高的变化,我可以为相对变化定义某些阈值,这将有助于我采用具有孤立峰值的窗口.然而,这在选择阈值时会产生问题,因为相对变化对数据中的噪声非常敏感.

My initial thought was to have a sliding window that calculates the relative change in the amplitude. Since the window with the peaks will have relatively higher changes, I could just define certain threshold for the relative change that would help me take the window with isolated peaks. However, this will create problem when choosing the threshold as the relative change is very sensitive to the noises in the data.

有什么建议吗?

推荐答案

为此,您需要从噪声中找出信号.

To do this you need to find signal out of noise.

  1. 获取信号的平均值并添加一些多人游戏,在噪声的顶部和底部放置边界 - 绿色虚线
  2. 找到低于噪声底部的峰值 -> 数组 2 组数据
  3. 在噪声之上找到峰值 -> 数组 2 组数据
  4. 获取底部第一个峰值的最小索引和第一个峰值顶部的最大索引以找到第一个峰值范围
  5. 获取顶部第二个峰值的最小索引和第二个峰值底部的最大索引以找到第二个峰值范围

代码中的一些描述.使用此方法,您可以找到其他峰.您需要手动输入的一件事是告诉程序峰值之间的 x 值,以便将数据拆分为多个部分.

Some description in code. With this method you can find other peaks. One thing that you need to input by hand is to tell program thex value between peaks for splitting data into parts.

见图表摘要.

import numpy as np
from matplotlib import pyplot as plt


# create noise data
def function(x, noise):
    y = np.sin(7*x+2) + noise
    return y

def function2(x, noise):
    y = np.sin(6*x+2) + noise
    return y


noise = np.random.uniform(low=-0.3, high=0.3, size=(100,))
x_line0 = np.linspace(1.95,2.85,100)
y_line0 = function(x_line0, noise)
x_line = np.linspace(0, 1.95, 100)
x_line2 = np.linspace(2.85, 3.95, 100)
x_pik = np.linspace(3.95, 5, 100)
y_pik = function2(x_pik, noise)
x_line3 = np.linspace(5, 6, 100)

# concatenate noise data
x = np.linspace(0, 6, 500)
y = np.concatenate((noise, y_line0, noise, y_pik, noise), axis=0)

# plot data
noise_band = 1.1
top_noise = y.mean()+noise_band*np.amax(noise)
bottom_noise = y.mean()-noise_band*np.amax(noise)
fig, ax = plt.subplots()
ax.axhline(y=y.mean(), color='red', linestyle='--')
ax.axhline(y=top_noise, linestyle='--', color='green')
ax.axhline(y=bottom_noise, linestyle='--', color='green')
ax.plot(x, y)

# split data into 2 signals
def split(arr, cond):
  return [arr[cond], arr[~cond]]

# find bottom noise data indexes
botom_data_indexes = np.argwhere(y < bottom_noise)
# split by visual x value
splitted_bottom_data = split(botom_data_indexes, botom_data_indexes < np.argmax(x > 3))

# find top noise data indexes
top_data_indexes = np.argwhere(y > top_noise)
# split by visual x value
splitted_top_data = split(top_data_indexes, top_data_indexes < np.argmax(x > 3))

# get first signal range
first_signal_start = np.amin(splitted_bottom_data[0])
first_signal_end = np.amax(splitted_top_data[0])

# get x index of first signal
x_first_signal = np.take(x, [first_signal_start, first_signal_end])
ax.axvline(x=x_first_signal[0], color='orange')
ax.axvline(x=x_first_signal[1], color='orange')

# get second signal range
second_signal_start = np.amin(splitted_top_data[1])
second_signal_end = np.amax(splitted_bottom_data[1])

# get x index of first signal
x_second_signal = np.take(x, [second_signal_start, second_signal_end])
ax.axvline(x=x_second_signal[0], color='orange')
ax.axvline(x=x_second_signal[1], color='orange')

plt.show()

输出:

红线=所有数据的平均值

red line = mean value of all data

绿线 - 顶部和底部噪声边界

green line - top and bottom noise borders

橙色线 - 选定的峰值数据

orange line - selected peak data

这篇关于python中的时间序列分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆