重置索引后无法过滤索引列中包含特定值的行 [英] Failed to filter rows containing a specific value in the index column after resetting index

查看:145
本文介绍了重置索引后无法过滤索引列中包含特定值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在组织许多计划的数据,其中包含计划阶段的信息,P(初步)或F(最终).我正在使用示例中显示的方法a>在熊猫文档中.

I'm organizing data of a number of plans, which contains the information of the phase of the plan, P(Preliminary) or F(Final). I'm using the methods shown in the examples in the pandas documentation.

df1 = pd.read_excel('FilePath', sheetname = 'ForFilter')
df1


landuse_SUB_ID  TYPE    RECD_DATE   PHASE   LAND_USE    CPACTIONDA
0   24  1   2000-04-07  P   ROW 2000-05-04
1   24  1   2000-04-07  P   NONE    2000-05-04
2   25  1   2000-08-10  P   COMM    2000-09-08
3   34  1   2000-04-14  F   REC 2000-04-14
4   34  1   2000-04-14  F   SFD 2000-04-14
5   35  1   2000-01-20  P   NONE    2000-02-02
6   42  1   2000-04-04  P   SFD 2000-05-01
7   42  1   2000-12-06  P   SFD 2001-01-03
8   43  1   2000-09-07  P   NONE    2000-09-21
9   51  1   2000-11-10  P   NONE    2000-11-28
10  53  1   2000-02-22  F   SFD 2000-02-22

在使用示例中的方法(使用likeregex)之后,在我看来这些方法只能过滤索引列中的值.因此,我更改了索引:

After playing with the methods in the example (using like and regex), it seems to me that these methods can only filter the values in the index column. Therefore I changed the index:

df1_filter1 = df1.set_index('PHASE')


landuse_SUB_ID  TYPE    RECD_DATE   LAND_USE    CPACTIONDA
PHASE                   
P   24  1   2000-04-07  ROW 2000-05-04
P   24  1   2000-04-07  NONE    2000-05-04
P   25  1   2000-08-10  COMM    2000-09-08
F   34  1   2000-04-14  REC 2000-04-14
F   34  1   2000-04-14  SFD 2000-04-14
P   35  1   2000-01-20  NONE    2000-02-02
P   42  1   2000-04-04  SFD 2000-05-01
P   42  1   2000-12-06  SFD 2001-01-03
P   43  1   2000-09-07  NONE    2000-09-21
P   51  1   2000-11-10  NONE    2000-11-28
F   53  1   2000-02-22  SFD 2000-02-22

现在数据帧正在使用Phase作为索引,我使用了like方法来过滤df1_filter1:

Now the data frame is using Phase as index, I used the like method to filter df1_filter1:

df1_filter1.filter(like = 'F', axis = 0)

我收到错误

"ValueError:无法从重复的轴重新索引"

"ValueError: cannot reindex from a duplicate axis"

对我来说,这似乎是一个非常简单的操作,所以我只是想知道自己做错了什么导致了此错误.对于我的问题,最好的方法(最少的步骤和最干净的代码)应该是什么.

This seems like a really simple operation to me, so I'm just wondering what I did wrong to have caused this error. And what shall be the best method (fewest steps and cleanest code) for my question.

推荐答案

正如已经指出的,对于此任务,不需要filter.在 @Alexander的答案中使用了loc.另外,您也可以使用 query :

As already pointed out, for this task, filter is not required. In @Alexander's answer loc is used. As an alternative, you could also use query:

df1.query('PHASE == "F"')

    landuse_SUB_ID  TYPE   RECD_DATE PHASE LAND_USE  CPACTIONDA
3               34     1  2000-04-14     F      REC  2000-04-14
4               34     1  2000-04-14     F      SFD  2000-04-14
10              53     1  2000-02-22     F      SFD  2000-02-22

filter对我来说也很好:

df1_filter1 = df1.set_index('PHASE')
df1_filter1.filter(like='F', axis=0)

       landuse_SUB_ID  TYPE   RECD_DATE LAND_USE  CPACTIONDA
PHASE                                                       
F                  34     1  2000-04-14      REC  2000-04-14
F                  34     1  2000-04-14      SFD  2000-04-14
F                  53     1  2000-02-22      SFD  2000-02-22

关于您使用regex的问题:

df2 = df1.set_index('LAND_USE')
df2.filter(regex="E$", axis=0)

          landuse_SUB_ID  TYPE   RECD_DATE PHASE  CPACTIONDA
LAND_USE                                                    
NONE                  24     1  2000-04-07     P  2000-05-04
NONE                  35     1  2000-01-20     P  2000-02-02
NONE                  43     1  2000-09-07     P  2000-09-21
NONE                  51     1  2000-11-10     P  2000-11-28

此处过滤所有以E结尾的行.

Here you filter all rows that end with an E.

您可以通过例如正在做:

The same you could achieve by e.g. doing:

df1[df1['LAND_USE'].str.endswith('E')]

   landuse_SUB_ID  TYPE   RECD_DATE PHASE LAND_USE  CPACTIONDA
1              24     1  2000-04-07     P     NONE  2000-05-04
5              35     1  2000-01-20     P     NONE  2000-02-02
8              43     1  2000-09-07     P     NONE  2000-09-21
9              51     1  2000-11-10     P     NONE  2000-11-28

filter的问题在于,然后您创建一个具有非唯一值的索引,这通常是一个坏主意.因此,我会选择.loc.query.

The problem with filter here is that you then create an index with non-unique values which is usually a bad idea. So, I would go with .loc or .query.

这篇关于重置索引后无法过滤索引列中包含特定值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆