在 pandas 数据框中查找所有模式的索引 [英] Finding Index of All Patterns Within Pandas Dataframe
问题描述
我正在使用按日期时间索引的Pandas数据框,如下所示:
I'm using a Pandas dataframe indexed by datetimes that looks something like this:
TimeSys_Index
2014-08-29 00:00:18 0
2014-08-29 00:00:19 0
2014-08-29 00:00:20 1
2014-08-29 00:00:21 1
2014-08-29 00:00:22 0
2014-08-29 00:00:23 0
2014-08-29 00:00:24 0
2014-08-29 00:00:25 0
2014-08-29 00:00:26 0
2014-08-29 00:00:27 1
2014-08-29 00:00:28 1
2014-08-29 00:00:29 1
2014-08-29 00:00:30 1
2014-08-29 00:00:31 0
2014-08-29 00:00:32 0
2014-08-29 00:00:33 0
...
我想为模式[0,0,1,1]的每次出现找到索引(时间).使用以上序列,我希望它返回['2014-08-29 00:00:18','2014-08-29 00:00:25'].更重要的是,这需要向量化或至少非常快.
I want to find the index (time) for every occurrence of the pattern [0, 0, 1, 1]. Using the above sequence I'd like it to return ['2014-08-29 00:00:18', '2014-08-29 00:00:25']. The kicker is this needs to be vectorized or at least very quick.
我当时正在考虑将整个向量与模式向量进行关联,并找到所得向量等于4的索引,但是必须有一种更简单的方法.
I was thinking of running a correlation of the full vector with the pattern vector and finding the indices where the resulting vector equals 4, but there's got to be a simpler way.
推荐答案
您可以查看移位后的值:
You can look at the shifted values:
>>> df.head()
val
TimeSys_Index
2014-08-29 00:00:18 0
2014-08-29 00:00:19 0
2014-08-29 00:00:20 1
2014-08-29 00:00:21 1
2014-08-29 00:00:22 0
>>> i = (df['val'] == 0) & (df['val'].shift(-1) == 0)
>>> i &= (df['val'].shift(-2) == 1) & (df['val'].shift(-3) == 1)
>>> df.index[i]
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-08-29 00:00:18, 2014-08-29 00:00:25]
Length: 2, Freq: None, Timezone: None
这篇关于在 pandas 数据框中查找所有模式的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!