pandas 按两列值过滤行,不区分大小写 [英] pandas filter rows by two column values with case insenstive

查看:44
本文介绍了 pandas 按两列值过滤行,不区分大小写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的数据框,如下所示:

I have a simple dataframe as follows:

Last Known Date ConfigredValue  ReferenceValue
0   24-Jun-17   False   FALSE
1   25-Jun-17   FALSE   FALSE
2   26-Jun-17   TRUE    FALSE
3   27-Jun-17   FALSE   FALSE
4   28-Jun-17   false   FALSE

如果我执行以下命令

df=df[df['ConfigredValue']!=dfs['ReferenceValue']]

然后我得到如下

0   24-Jun-17   False   FALSE
2   26-Jun-17   TRUE    FALSE
4   28-Jun-17   false   FALSE

但我想要不区分大小写的过滤器(case=False)

But I want the filter with case insensitive (case=False)

我想要以下输出:

2   26-Jun-17   TRUE    FALSE

请建议,如何过滤不区分大小写的数据(case=false)

Please suggest, how to get filtered case insensitive data(case=false)

推荐答案

选项 1:转换为小写或大写并进行比较

最简单的是在检查相等之前将两列转换为lower(或upper):

Option 1: convert to lowercase or to uppercase and compare

The simplest is to convert the two columns to lower (or to upper) before checking for equality:

df=df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]

df=df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]

输出:

Out: 
  Last Known Date ConfigredValue ReferenceValue
2    2  26-Jun-17           TRUE          FALSE

<小时>

选项 2:比较长度

在这种特殊情况下,您可以简单地比较 TRUE 和 True 的长度,无论字符串是大写还是小写,它们都相同:


Option 2: Compare the lengths

In this particuler case, you can simply compare the lengths of TRUE and True, they are the same wether the string is upper or lower case:

df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]

输出:

Out: 
  Last Known Date ConfigredValue ReferenceValue
2    2  26-Jun-17           TRUE          FALSE

<小时>

选项 3:矢量化标题

str.title() 在@0p3n5ourcE 答案中也被建议,这是它的矢量化版本:


Option 3: Vectorized title

str.title() was also suggested in @0p3n5ourcE answer, here's a vectorized version of it:

df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]

<小时>

执行时间

对速度进行基准测试表明 str.len() 有点快

In [35]: timeit df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
1000 loops, best of 3: 496 µs per loop

In [36]: timeit df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
1000 loops, best of 3: 496 µs per loop

In [37]: timeit df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
1000 loops, best of 3: 495 µs per loop

In [38]: timeit df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
1000 loops, best of 3: 479 µs per loop

这篇关于 pandas 按两列值过滤行,不区分大小写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆