在所有 Pandas DataFrame 列中搜索 String 并过滤 [英] Search for String in all Pandas DataFrame columns and filter
问题描述
认为这会很简单,但在寻找一种优雅的方式来同时搜索数据帧中的所有列以进行部分字符串匹配时遇到了一些麻烦.基本上我将如何将 df['col1'].str.contains('^')
一次应用于整个数据帧并过滤到任何包含匹配记录的行?
Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^')
to an entire dataframe at once and filter down to any rows that have records containing the match?
推荐答案
Series.str.contains
方法需要正则表达式模式(默认情况下),而不是文字字符串.因此 str.contains("^")
匹配任何字符串的开头.由于每个字符串都有开头,因此所有内容都匹配.而是使用 str.contains("^")
来匹配文字 ^
字符.
The Series.str.contains
method expects a regex pattern (by default), not a literal string. Therefore str.contains("^")
matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("^")
to match the literal ^
character.
要检查每一列,您可以使用 for col in df
遍历列名,然后在每一列上调用 str.contains
:
To check every column, you could use for col in df
to iterate through the column names, and then call str.contains
on each column:
mask = np.column_stack([df[col].str.contains(r"^", na=False) for col in df])
df.loc[mask.any(axis=1)]
或者,您可以将 regex=False
传递给 str.contains
以使测试使用 Python in
运算符;但是(通常)使用正则表达式会更快.
Alternatively, you could pass regex=False
to str.contains
to make the test use the Python in
operator; but (in general) using regex is faster.
这篇关于在所有 Pandas DataFrame 列中搜索 String 并过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!