pandas 按两列值过滤行,不区分大小写 [英] pandas filter rows by two column values with case insenstive
问题描述
我有一个简单的数据框,如下所示:
I have a simple dataframe as follows:
Last Known Date ConfigredValue ReferenceValue
0 24-Jun-17 False FALSE
1 25-Jun-17 FALSE FALSE
2 26-Jun-17 TRUE FALSE
3 27-Jun-17 FALSE FALSE
4 28-Jun-17 false FALSE
如果我执行以下命令
df=df[df['ConfigredValue']!=dfs['ReferenceValue']]
然后我得到如下
0 24-Jun-17 False FALSE
2 26-Jun-17 TRUE FALSE
4 28-Jun-17 false FALSE
但我想要不区分大小写的过滤器(case=False)
But I want the filter with case insensitive (case=False)
我想要以下输出:
2 26-Jun-17 TRUE FALSE
请建议,如何过滤不区分大小写的数据(case=false)
Please suggest, how to get filtered case insensitive data(case=false)
推荐答案
选项 1:转换为小写或大写并进行比较
最简单的是在检查相等之前将两列转换为lower(或upper):
Option 1: convert to lowercase or to uppercase and compare
The simplest is to convert the two columns to lower (or to upper) before checking for equality:
df=df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
或
df=df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
输出:
Out:
Last Known Date ConfigredValue ReferenceValue
2 2 26-Jun-17 TRUE FALSE
<小时>
选项 2:比较长度
在这种特殊情况下,您可以简单地比较 TRUE 和 True 的长度,无论字符串是大写还是小写,它们都相同:
Option 2: Compare the lengths
In this particuler case, you can simply compare the lengths of TRUE and True, they are the same wether the string is upper or lower case:
df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
输出:
Out:
Last Known Date ConfigredValue ReferenceValue
2 2 26-Jun-17 TRUE FALSE
<小时>
选项 3:矢量化标题
str.title()
在@0p3n5ourcE 答案中也被建议,这是它的矢量化版本:
Option 3: Vectorized title
str.title()
was also suggested in @0p3n5ourcE answer, here's a vectorized version of it:
df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
<小时>
执行时间
对速度进行基准测试表明 str.len()
有点快
In [35]: timeit df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
1000 loops, best of 3: 496 µs per loop
In [36]: timeit df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
1000 loops, best of 3: 496 µs per loop
In [37]: timeit df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
1000 loops, best of 3: 495 µs per loop
In [38]: timeit df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
1000 loops, best of 3: 479 µs per loop
这篇关于 pandas 按两列值过滤行,不区分大小写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!