pandas :标记连续值 [英] Pandas: flag consecutive values

查看:71
本文介绍了 pandas :标记连续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个格式为[0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1].

0: indicates economic increase.
1: indicates economic decline.

经济衰退是连续两次下跌(1)的信号.

A recession is signaled by two consecutive declines (1).

经济衰退的结束标志着两个连续的增长(0).

The end of the recession is signaled by two consecutive increase (0).

在上述数据集中,我有两次衰退,始于指数3,始于指数5,始于指数8,始于指数11.

In the above dataset I have two recessions, begin at index 3, end at index 5 and begin at index 8 end at index 11.

我不知道如何用熊猫来解决这个问题.我想确定衰退开始和结束的指数.任何帮助将不胜感激.

I am at a lost for how to approach this with pandas. I would like to identify the index for the start and end of the recession. Any assistance would be appreciated.

这是我尝试用soln的python.

Here is my python attempt at a soln.

np_decline =  np.array([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
recession_start_flag = 0
recession_end_flag = 0
recession_start = []
recession_end = []

for i in range(len(np_decline) - 1):
    if recession_start_flag == 0 and np_decline[i] == 1 and np_decline[i + 1] == 1:
        recession_start.append(i)
        recession_start_flag = 1
    if recession_start_flag == 1 and np_decline[i] == 0 and np_decline[i + 1] == 0:
        recession_end.append(i - 1)
        recession_start_flag = 0

print(recession_start)
print(recession_end)

以大熊猫为中心的方法吗? 莱昂

Is the a more pandas centric approach? Leon

推荐答案

以1开头的运行满足条件

The start of a run of 1's satisfies the condition

x_prev = x.shift(1)
x_next = x.shift(-1)
((x_prev != 1) & (x == 1) & (x_next == 1))

也就是说,运行开始时的值为1,上一个值不是1,下一个值为1.类似地,运行结束时满足条件

That is to say, the value at the start of a run is 1 and the previous value is not 1 and the next value is 1. Similarly, the end of a run satisfies the condition

((x == 1) & (x_next == 0) & (x_next2 == 0))

因为运行结束时的值为1,而接下来的两个值均为0. 我们可以使用np.flatnonzero找到符合这些条件的索引:

since the value at the end of a run is 1 and the next two values value are 0. We can find indices where these conditions are true using np.flatnonzero:

import numpy as np
import pandas as pd

x = pd.Series([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
x_prev = x.shift(1)
x_next = x.shift(-1)
x_next2 = x.shift(-2)
df = pd.DataFrame(
    dict(start = np.flatnonzero((x_prev != 1) & (x == 1) & (x_next == 1)),
         end = np.flatnonzero((x == 1) & (x_next == 0) & (x_next2 == 0))))
print(df[['start', 'end']])

收益

   start  end
0      3    5
1      8   11

这篇关于 pandas :标记连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆