另一个条件处于活动状态时，满足 pandas 第一个约会条件 [英] Pandas first date condition is met while another condition is active

查看：62 发布时间：2020/5/18 23:04:31 python-3.x pandas numpy

本文介绍了另一个条件处于活动状态时，满足 pandas 第一个约会条件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有时间序列分数的数据框.我的目标是检测分数何时大于某个特定阈值th，然后查找分数何时回到0.分别查找每个条件很容易

I have a dataframe with a time series of scores. My goal is to detect when the score is larger than a certain threshold th and then to find when the score goes back to 0. Is quite easy to find each condition separately

dates_1 = score > th
dates_2 = np.sign(score[1:]) == np.sign(score.shift(1).dropna())

但是，我不知道什么是重写date_2的最Python方法，以便仅在观察到活动的" date_1时才显示日期

However, I don't know what's the most pythonic way to override dates_2 so that only dates when an 'active' date_1 has been observed

每当score > th为True时，也许使用辅助列'active'设置为1，并在满足dates_2的条件时将其设置为False.这样，我可以要求更改符号AND active == True.但是，这种方法需要迭代，我想知道是否存在针对我的问题的矢量化解决方案

Perhaps using an auxiliary column 'active' set to 1 whenever score > th is True and set it to False when the condition for dates_2 is met. That way I can ask for the change in sign AND active == True. However, that approach requires iteration and I'm wondering if there's a vectorized solution to my problem

关于如何改善我的方法的任何想法吗?

Any thoughts on how to improve my approach?

样本数据:

date         score
2010-01-04   0.0
2010-01-05  -0.3667779798467592
2010-01-06  -1.9641427199568868
2010-01-07  -0.49976215445519134
2010-01-08  -0.7069108074548405
2010-01-11  -1.4624766212523337
2010-01-12  -0.9132777669357441
2010-01-13   0.16204588193577152
2010-01-14   0.958085568609925
2010-01-15   1.4683022129399834
2010-01-19   3.036016680985081
2010-01-20   2.2357911432637345
2010-01-21   2.8827438241030707
2010-01-22   -3.395977874791837

预期产量

如果th = 0.94

Expected Output

if th = 0.94

date    active
2010-01-04  False
2010-01-05  False
2010-01-06  False
2010-01-07  False
2010-01-08  False
2010-01-11  False
2010-01-12  False
2010-01-13  False
2010-01-14  True
2010-01-15  True
2010-01-19  True
2010-01-20  True
2010-01-21  True
2010-01-22  False

未向量化！

def alt_cond(s, th):
    active = False
    for x in s:
        active = [x >= th, x > 0][int(active)]
        yield active

df.assign(A=[*alt_cond(df.score, 0.94)])

          date     score      A
0   2010-01-04  0.000000  False
1   2010-01-05 -0.366778  False
2   2010-01-06 -1.964143  False
3   2010-01-07 -0.499762  False
4   2010-01-08 -0.706911  False
5   2010-01-11 -1.462477  False
6   2010-01-12 -0.913278  False
7   2010-01-13  0.162046  False
8   2010-01-14  0.958086   True
9   2010-01-15  1.468302   True
10  2010-01-19  3.036017   True
11  2010-01-20  2.235791   True
12  2010-01-21  2.882744   True
13  2010-01-22 -3.395978  False

矢量化(排序)

我使用Numba确实加快了速度.仍然是一个循环，但如果可以安装numba

Vectorized (Sort Of)

I used Numba to really speed things up. Still a loop but should be very fast if you can install numba

from numba import njit

@njit
def alt_cond(s, th):
    active = False
    out = np.zeros(len(s), dtype=np.bool8)
    for i, x in enumerate(s):
        if active:
            if x <= 0:
                active = False
        else:
            if x >= th:
                active = True
        out[i] = active
    return out

df.assign(A=alt_cond(df.score.values, .94))

回复评论

您可以拥有一个列名和阈值字典并进行迭代

Response to Comment

You can have a dictionary of column names and threshold values and iterate

th = {'score': 0.94}

df.join(pd.DataFrame(
    np.column_stack([[*alt_cond(df[k], v)] for k, v in th.items()]),
    df.index, [f"{k}_A" for k in th]
))


          date     score  score_A
0   2010-01-04  0.000000    False
1   2010-01-05 -0.366778    False
2   2010-01-06 -1.964143    False
3   2010-01-07 -0.499762    False
4   2010-01-08 -0.706911    False
5   2010-01-11 -1.462477    False
6   2010-01-12 -0.913278    False
7   2010-01-13  0.162046    False
8   2010-01-14  0.958086     True
9   2010-01-15  1.468302     True
10  2010-01-19  3.036017     True
11  2010-01-20  2.235791     True
12  2010-01-21  2.882744     True
13  2010-01-22 -3.395978    False

这篇关于另一个条件处于活动状态时，满足 pandas 第一个约会条件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

另一个条件处于活动状态时，满足 pandas 第一个约会条件 [英] Pandas first date condition is met while another condition is active

问题描述

预期产量

Expected Output

推荐答案

未向量化！

矢量化(排序)

Vectorized (Sort Of)

回复评论

Response to Comment

相关文章

Python最新文章

热门教程

热门工具

登录关闭

另一个条件处于活动状态时，满足 pandas 第一个约会条件 [英] Pandas first date condition is met while another condition is active

问题描述

预期产量

Expected Output

推荐答案

未向量化！

矢量化(排序)

Vectorized (Sort Of)

回复评论

Response to Comment

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭