根据pandas DataFrame列中的值序列查找行索引 [英] Finding the index of rows based on a sequence of values in a column of pandas DataFrame

查看:946
本文介绍了根据pandas DataFrame列中的值序列查找行索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,其中的一列具有三个唯一的字符串。我需要做的是生成一个包含行索引的列表,该行的索引在良好之后为非常不好,但在不良之后为非常不好。

I have a DataFrame with a column that has three unique character strings. What I need to do is to generate a list containing indexes of rows that has 'very bad' after good, but not 'very bad' after 'bad'.

import random
df = pd.DataFrame({
    'measure': [random.randint(0,10) for _ in range(0,20)],
})

df['status'] = df.apply(
    lambda x: 'good' if x['measure'] > 4 else 'very bad' if x['measure'] < 2  else 'bad',
    axis=1)



    measure    status
0         8      good
1         8      good
2         0  very bad
3         5      good
4         2       bad
5         3       bad
6         9      good
7         9      good
8        10      good
9         5      good
10        1  very bad
11        7      good
12        7      good
13        6      good
14        5      good
15       10      good
16        3       bad
17        0  very bad
18        3       bad
19        5      good

我希望得到这个列表:

[2,10]

对此是否有一线解决方案?

Is there a one line solution to this?

我不想使用数字值,因为它们仅在此处用于生成DataFrame或遍历所有行,这对于我的用例而言在计算上是昂贵的。

I don't want to use numeric values as they are used purely here to generate the DataFrame or loop over all rows which is computationally expensive for my use case.

推荐答案

如果数据框索引是默认范围索引,则可以使用以下方法:

If your dataframe index is default range index, then you can use this:

np.where((df['status'] == 'very bad') & (df['status'].shift() == 'good'))[0]

输出:

array([ 2, 10], dtype=int64)

其他,则可以使用以下命令:

Else, you can use the following:

irow = np.where((df['status'] == 'very bad') & (df['status'].shift() == 'good'))[0]
df.index[irow]

这篇关于根据pandas DataFrame列中的值序列查找行索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆