在所有Pandas DataFrame列中搜索String并进行过滤 [英] Search for String in all Pandas DataFrame columns and filter

查看:936
本文介绍了在所有Pandas DataFrame列中搜索String并进行过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这会很直接,但是在跟踪以一种优雅的方式同时搜索数据帧中的所有列以查找部分字符串匹配时遇到了一些麻烦.基本上,我将如何一次将df['col1'].str.contains('^')应用于整个数据帧,并向下过滤到包含包含匹配项的记录的任何行?

Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^') to an entire dataframe at once and filter down to any rows that have records containing the match?

推荐答案

Series.str.contains方法要求使用正则表达式模式(默认情况下),而不是文字字符串.因此,str.contains("^")匹配任何字符串的开头.由于每个字符串都有一个开始,因此所有内容都匹配.而是使用str.contains("\^")匹配文字^字符.

The Series.str.contains method expects a regex pattern (by default), not a literal string. Therefore str.contains("^") matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^") to match the literal ^ character.

要检查每一列,可以使用for col in df遍历列名,然后在每一列上调用str.contains:

To check every column, you could use for col in df to iterate through the column names, and then call str.contains on each column:

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]

或者,您可以将regex=False传递给str.contains,以使测试使用Python in运算符;但是(通常)使用正则表达式会更快.

Alternatively, you could pass regex=False to str.contains to make the test use the Python in operator; but (in general) using regex is faster.

这篇关于在所有Pandas DataFrame列中搜索String并进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆