首次匹配 pandas 时间序列数据后忽略np.where [英] Ignore np.where after first match for Pandas time series data

查看:60
本文介绍了首次匹配 pandas 时间序列数据后忽略np.where的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据以下示例,我需要使我的代码忽略第一次匹配时间序列数据后的np.where.

As per the following example I need to make my code ignore the np.where after first match for time series data.

因此,在2014-03-04 14:00:00行上,np.where在test_output列上给出了1.0,并且正如预期的那样,在下一行上也给出了1.0.我只希望这一次触发一次.我将在问题的末尾显示所需的输出.

So on the 2014-03-04 14:00:00 row the np.where gives a 1.0 on the test_output column and, as would be expected, also on the next row. I only want this to trigger once ever. I will show desired output at the end of the question.

感谢您查看问题.

为测试生成的数据帧:

df = pd.DataFrame(index=pd.date_range(start='2014-03-04 09:00:00', end='2014-03-04 16:15:00', freq='1h') + pd.date_range(start='2014-03-05 09:00:00', end='2014-03-05 16:15:00', freq='1h'), data={'test_1': np.nan})

df['test_1'][5:16]=1.0

df['test_output'] = np.where(df['test_1'] == 1.0,1.0,np.nan);
df

test_1  test_output
2014-03-04 09:00:00 NaN NaN
2014-03-04 10:00:00 NaN NaN
2014-03-04 11:00:00 NaN NaN
2014-03-04 12:00:00 NaN NaN
2014-03-04 13:00:00 NaN NaN
2014-03-04 14:00:00 1.0 1.0
2014-03-04 15:00:00 NaN NaN
2014-03-04 16:00:00 1.0 1.0
2014-03-05 09:00:00 1.0 1.0

这是所需的输出:

test_1  test_output
2014-03-04 09:00:00 NaN NaN
2014-03-04 10:00:00 NaN NaN
2014-03-04 11:00:00 NaN NaN
2014-03-04 12:00:00 NaN NaN
2014-03-04 13:00:00 NaN NaN
2014-03-04 14:00:00 1.0 1.0
2014-03-04 15:00:00 NaN NaN
2014-03-04 16:00:00 1.0 NaN
2014-03-05 09:00:00 1.0 NaN

推荐答案

使用 first_valid_index 来设置第一行:

use first_valid_index on the mask to set the first row:

In [30]:
df.loc[df[df['test_1'] == 1.0].first_valid_index(),'test_output'] = 1.0
df

Out[30]:
                     test_1  test_output
2014-03-04 09:00:00     NaN          NaN
2014-03-04 10:00:00     NaN          NaN
2014-03-04 11:00:00     NaN          NaN
2014-03-04 12:00:00     NaN          NaN
2014-03-04 13:00:00     NaN          NaN
2014-03-04 14:00:00     1.0          1.0
2014-03-04 15:00:00     1.0          NaN
2014-03-04 16:00:00     1.0          NaN
2014-03-05 09:00:00     1.0          NaN
2014-03-05 10:00:00     1.0          NaN
2014-03-05 11:00:00     1.0          NaN
2014-03-05 12:00:00     1.0          NaN
2014-03-05 13:00:00     1.0          NaN
2014-03-05 14:00:00     1.0          NaN
2014-03-05 15:00:00     1.0          NaN
2014-03-05 16:00:00     1.0          NaN

分解以上内容:

In [32]:
df['test_1'] == 1.0

Out[32]:
2014-03-04 09:00:00    False
2014-03-04 10:00:00    False
2014-03-04 11:00:00    False
2014-03-04 12:00:00    False
2014-03-04 13:00:00    False
2014-03-04 14:00:00     True
2014-03-04 15:00:00     True
2014-03-04 16:00:00     True
2014-03-05 09:00:00     True
2014-03-05 10:00:00     True
2014-03-05 11:00:00     True
2014-03-05 12:00:00     True
2014-03-05 13:00:00     True
2014-03-05 14:00:00     True
2014-03-05 15:00:00     True
2014-03-05 16:00:00     True
Freq: BH, Name: test_1, dtype: bool

In [33]:
df[df['test_1'] == 1.0].first_valid_index()

Out[33]:
Timestamp('2014-03-04 14:00:00', offset='BH')

您可以使用np.where通过再次屏蔽df来做到这一点,从而通过将np数组与1.0进行比较,生成条件为假的NaN:

You can do it using np.where by again masking against the df so it produces NaN where the condition is false by comparing the np array against 1.0:

In [41]:
df.loc[df[np.where(df['test_1'] == 1.0, 1.0, 0) == 1].first_valid_index(), 'test_output'] = 1.0

df
Out[41]:
                     test_1  test_output
2014-03-04 09:00:00     NaN          NaN
2014-03-04 10:00:00     NaN          NaN
2014-03-04 11:00:00     NaN          NaN
2014-03-04 12:00:00     NaN          NaN
2014-03-04 13:00:00     NaN          NaN
2014-03-04 14:00:00     1.0          1.0
2014-03-04 15:00:00     1.0          NaN
2014-03-04 16:00:00     1.0          NaN
2014-03-05 09:00:00     1.0          NaN
2014-03-05 10:00:00     1.0          NaN
2014-03-05 11:00:00     1.0          NaN
2014-03-05 12:00:00     1.0          NaN
2014-03-05 13:00:00     1.0          NaN
2014-03-05 14:00:00     1.0          NaN
2014-03-05 15:00:00     1.0          NaN
2014-03-05 16:00:00     1.0          NaN

这篇关于首次匹配 pandas 时间序列数据后忽略np.where的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆