如何在 pandas 中找到图案? [英] How to find patterns in Pandas?

查看:73
本文介绍了如何在 pandas 中找到图案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用pandaspython,我想找到一种模式,其中流的流入量比平时大得多,并且在5天之内流出的流量不小于流中流入量的5%.溪流.参见下面的数据框.

Using pandas and python, I want to find a pattern where a stream's inflow is much larger than usual, and it is followed within 5 days with an outflow that is no less than 5% of the inflow in the stream. See data frame below.

我希望能够在新列中标记此运动(我们称之为标记).

I want to be able to flag this movement in a new column (let's call it flag).

想象一下,这个数据帧有成千上万的行,并且您想要找到一个相似的模式并在整个过程中对其进行标记.

Imagine this data frame has thousands of rows and you want to find a similar pattern and have it flagged throughout.

Index    date           stream
0        2019-01-01        2
1        2019-01-02        0
2        2019-01-03        1
3        2019-01-04        0
4        2019-01-05        3
5        2019-01-06        2
7        2019-01-07        100
8        2019-01-08        0
9        2019-01-09        0
10       2019-01-10       -95
11       2019-01-11        3    
12       2019-01-13        0  
13       2019-01-14        2
14       2019-01-15       -1
15       2019-01-16        0
16       2019-01-17        2
17       2019-01-18        93
18       2019-01-19       -2
19       2019-01-20       -89

推荐答案

尝试在df['stream']上执行rolling averaging.

stream = [2, 0, 1, 0, 3, 2, 100, 0, 0, -95, 3, 0, 2, -1, 0, 2, 93, -2, -89]
date = [
    '2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05',
    '2019-01-06', '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
    '2019-01-11', '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
    '2019-01-17', '2019-01-18', '2019-01-19', '2019-01-20'
]

df = pd.DataFrame({'date': date, 'stream': stream})

def process(row):
    if row['stream'] > 20*row['stream_mean']:
        return 1
    else:
        return 0
df['stream_mean'] = df['stream'].rolling(5).mean()
df['stream_mean'] = df['stream_mean'].shift(periods=1)
df['flag'] = df.apply(process,axis=1)
df

如果您应用Bollinger Band并创建Standard Deviation column,并且尝试使用95% Confidence interval方法,那会更好.

It would be better if you apply Bollinger Band and create a Standard Deviation column and may be also try a 95% Confidence interval method.

希望它会有所帮助:)

这篇关于如何在 pandas 中找到图案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆