通过包含str过滤 pandas 数据框行 [英] Filtering pandas dataframe rows by contains str
问题描述
我有一个带有很多行的python pandas数据框df
.从这些行中,我想切出并且仅使用"body"列中包含单词"ball"的行.为此,我可以这样做:
I have a python pandas dataframe df
with a lot of rows. From those rows, I want to slice out and only use the rows that contain the word 'ball' in the 'body' column. To do that, I can do:
df[df['body'].str.contains('ball')]
问题是,我希望它不区分大小写,这意味着如果出现Ball或bAll一词,我也希望它们.进行不区分大小写的搜索的一种方法是将字符串转换为小写,然后以这种方式搜索.我想知道如何去做.我尝试过
The issue is, I want it to be case insensitive, meaning that if the word Ball or bAll showed up, I'll want those as well. One way to do case insensitive search is to turn the string to lowercase and then search that way. I'm wondering how to go about doing that. I tried
df[df['body'].str.lower().contains('ball')]
但这不起作用.我不确定是否应该在此等性质上使用lambda函数.
But that doesn't work. I'm not sure if I'm supposed to use a lambda function on this or something of that nature.
推荐答案
您可以再次使用.str
来访问字符串方法,或者(更好的是,恕我直言)使用case=False
来保证不区分大小写:>
You could either use .str
again to get access to the string methods, or (better, IMHO) use case=False
to guarantee case insensitivity:
>>> df = pd.DataFrame({"body": ["ball", "red BALL", "round sphere"]})
>>> df[df["body"].str.contains("ball")]
body
0 ball
>>> df[df["body"].str.lower().str.contains("ball")]
body
0 ball
1 red BALL
>>> df[df["body"].str.contains("ball", case=False)]
body
0 ball
1 red BALL
>>> df[df["body"].str.contains("ball", case=True)]
body
0 ball
(请注意,如果要进行分配,使用df.loc
是更好的习惯,以避免可怕的SettingWithCopyWarning,但如果我们只是在此处选择,那就没关系了.)
(Note that if you're going to be doing assignments, it's a better habit to use df.loc
, to avoid the dreaded SettingWithCopyWarning, but if we're just selecting here it doesn't matter.)
(注2:我真的不需要在此处指定回合".)
(Note #2: guess I really didn't need to specify 'round' there..)
这篇关于通过包含str过滤 pandas 数据框行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!