重置索引后无法过滤索引列中包含特定值的行 [英] Failed to filter rows containing a specific value in the index column after resetting index
问题描述
我正在组织许多计划的数据,其中包含计划阶段的信息,P(初步)或F(最终).我正在使用示例中显示的方法a>在熊猫文档中.
I'm organizing data of a number of plans, which contains the information of the phase of the plan, P(Preliminary) or F(Final). I'm using the methods shown in the examples in the pandas documentation.
df1 = pd.read_excel('FilePath', sheetname = 'ForFilter')
df1
landuse_SUB_ID TYPE RECD_DATE PHASE LAND_USE CPACTIONDA
0 24 1 2000-04-07 P ROW 2000-05-04
1 24 1 2000-04-07 P NONE 2000-05-04
2 25 1 2000-08-10 P COMM 2000-09-08
3 34 1 2000-04-14 F REC 2000-04-14
4 34 1 2000-04-14 F SFD 2000-04-14
5 35 1 2000-01-20 P NONE 2000-02-02
6 42 1 2000-04-04 P SFD 2000-05-01
7 42 1 2000-12-06 P SFD 2001-01-03
8 43 1 2000-09-07 P NONE 2000-09-21
9 51 1 2000-11-10 P NONE 2000-11-28
10 53 1 2000-02-22 F SFD 2000-02-22
在使用示例中的方法(使用like
和regex
)之后,在我看来这些方法只能过滤索引列中的值.因此,我更改了索引:
After playing with the methods in the example (using like
and regex
), it seems to me that these methods can only filter the values in the index column. Therefore I changed the index:
df1_filter1 = df1.set_index('PHASE')
landuse_SUB_ID TYPE RECD_DATE LAND_USE CPACTIONDA
PHASE
P 24 1 2000-04-07 ROW 2000-05-04
P 24 1 2000-04-07 NONE 2000-05-04
P 25 1 2000-08-10 COMM 2000-09-08
F 34 1 2000-04-14 REC 2000-04-14
F 34 1 2000-04-14 SFD 2000-04-14
P 35 1 2000-01-20 NONE 2000-02-02
P 42 1 2000-04-04 SFD 2000-05-01
P 42 1 2000-12-06 SFD 2001-01-03
P 43 1 2000-09-07 NONE 2000-09-21
P 51 1 2000-11-10 NONE 2000-11-28
F 53 1 2000-02-22 SFD 2000-02-22
现在数据帧正在使用Phase
作为索引,我使用了like
方法来过滤df1_filter1
:
Now the data frame is using Phase
as index, I used the like
method to filter df1_filter1
:
df1_filter1.filter(like = 'F', axis = 0)
我收到错误
"ValueError:无法从重复的轴重新索引"
"ValueError: cannot reindex from a duplicate axis"
对我来说,这似乎是一个非常简单的操作,所以我只是想知道自己做错了什么导致了此错误.对于我的问题,最好的方法(最少的步骤和最干净的代码)应该是什么.
This seems like a really simple operation to me, so I'm just wondering what I did wrong to have caused this error. And what shall be the best method (fewest steps and cleanest code) for my question.
推荐答案
正如已经指出的,对于此任务,不需要filter
.在 @Alexander的答案中使用了loc
.另外,您也可以使用 query
:
As already pointed out, for this task, filter
is not required. In @Alexander's answer loc
is used. As an alternative, you could also use query
:
df1.query('PHASE == "F"')
landuse_SUB_ID TYPE RECD_DATE PHASE LAND_USE CPACTIONDA
3 34 1 2000-04-14 F REC 2000-04-14
4 34 1 2000-04-14 F SFD 2000-04-14
10 53 1 2000-02-22 F SFD 2000-02-22
filter
对我来说也很好:
df1_filter1 = df1.set_index('PHASE')
df1_filter1.filter(like='F', axis=0)
landuse_SUB_ID TYPE RECD_DATE LAND_USE CPACTIONDA
PHASE
F 34 1 2000-04-14 REC 2000-04-14
F 34 1 2000-04-14 SFD 2000-04-14
F 53 1 2000-02-22 SFD 2000-02-22
关于您使用regex
的问题:
df2 = df1.set_index('LAND_USE')
df2.filter(regex="E$", axis=0)
landuse_SUB_ID TYPE RECD_DATE PHASE CPACTIONDA
LAND_USE
NONE 24 1 2000-04-07 P 2000-05-04
NONE 35 1 2000-01-20 P 2000-02-02
NONE 43 1 2000-09-07 P 2000-09-21
NONE 51 1 2000-11-10 P 2000-11-28
此处过滤所有以E
结尾的行.
Here you filter all rows that end with an E
.
您可以通过例如正在做:
The same you could achieve by e.g. doing:
df1[df1['LAND_USE'].str.endswith('E')]
landuse_SUB_ID TYPE RECD_DATE PHASE LAND_USE CPACTIONDA
1 24 1 2000-04-07 P NONE 2000-05-04
5 35 1 2000-01-20 P NONE 2000-02-02
8 43 1 2000-09-07 P NONE 2000-09-21
9 51 1 2000-11-10 P NONE 2000-11-28
filter
的问题在于,然后您创建一个具有非唯一值的索引,这通常是一个坏主意.因此,我会选择.loc
或.query
.
The problem with filter
here is that you then create an index with non-unique values which is usually a bad idea. So, I would go with .loc
or .query
.
这篇关于重置索引后无法过滤索引列中包含特定值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!