大 pandas -选择一对连续的行匹配条件 [英] pandas - Selecting pair of consecutive rows matching criteria

查看:79
本文介绍了大 pandas -选择一对连续的行匹配条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框

I have a dataframe that looks like this

>>> a_df
    state
1    A
2    B
3    A
4    B
5    C

我想做的是返回匹配特定序列的所有连续行.例如,如果此序列为['A', 'B'],则应返回状态为A后紧跟B的行.在上面的示例中:

What I'd like to do, is to return all consecutive rows matching a certain sequence. For instance, if this sequence is ['A', 'B'], then the rows whose state is A followed immediately by a B should be returned. In the above example:

>>> cons_criteria(a_df, ['A', 'B'])
    state
1    A
2    B
3    A
4    B

或者如果选择的数组是['A', 'B', 'C'],则输出应该是

Or if the chosen array is ['A', 'B', 'C'], then the output should be

>>> cons_criteria(a_df, ['A', 'B', 'C'])
    state
3    A
4    B
5    C

我决定通过存储当前状态以及下一个状态来做到这一点:

I decided to do this by storing the current state, as well as the next state:

>>> df2 = a_df.copy()
>>> df2['state_0'] = a_df['state']
>>> df2['state_1'] = a_df['state'].shift(-1)

现在,我可以对state_0state_1进行匹配.但这只会返回第一个条目:

Now, I can match with respect to state_0 and state_1. But this only returns the very first entry:

>>> df2[(df2['state_0'] == 'A') & (df2['state_1'] == 'B')]
    state
1    A
3    A

我应该在这里如何修正逻辑,以便返回所有连续的行?在大熊猫中有更好的方法来解决这个问题吗?

How should I fix the logic here so that all the consecutive rows are returned? Is there a better way to approach this in pandas?

推荐答案

我会使用这样的函数

def match_slc(s, seq):
    # get list, makes zip faster
    l = s.values.tolist()
    # count how many in sequence
    k = len(seq)
    # generate numpy array of rolling values
    a = np.array(list(zip(*[l[i:] for i in range(k)])))
    # slice an array from 0 to length of a - 1 with 
    # the truth values of wether all 3 in a sequence match
    p = np.arange(len(a))[(a == seq).all(1)]
    # p tracks the beginning of a match, get all subsequent
    # indices of the match as well.
    slc = np.unique(np.hstack([p + i for i in range(k)]))
    return s.iloc[slc]


演示


demonstration

s = pd.Series(list('ABABC'))

print(match_slc(s, list('ABC')), '\n')
print(match_slc(s, list('AB')), '\n')

2    A
3    B
4    C
dtype: object 

0    A
1    B
2    A
3    B
dtype: object 

这篇关于大 pandas -选择一对连续的行匹配条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆