将关键字(字符串)与Pandas Dataframe匹配 [英] Matching keywords (strings) with a Pandas Dataframe
问题描述
我有一个要与某些关键字匹配的数据框(我想检测包含这些关键字的行)
我设法通过这种方式获得了这份工作。但是我不知道是否有更好的方法可以知道我可能有多达10个或20个不同的关键字。
I have a Dataframe that I want to match against some keywords (I want to detect the rows that contain those keywords) I managed to get the job this way. But I wonder if there's a better way to do it knowing that I might have up to 10 or 20 different keywords.
df1 = df[df['column1'].str.contains("keyword1") | df['column1'].str.contains('keyword2')]
(我是初学者,请使其尽可能简单)
(I'm a beginner please keep it as simple as possible)
推荐答案
对于或逻辑,您可以创建单个模式通过将单词与 |
结合在一起。将您的10-20个单词存储在列表中,然后'|'.join(that_list)
。
For or logic you can create a single pattern by joining the words with |
. Store your 10-20 words in a list then '|'.join(that_list)
.
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': ['foo', 'bar', 'baz', 'foobar', 'boo']})
words = ['foo', 'bar']
df['foo_OR_bar'] = df['col1'].str.contains('|'.join(words))
# col1 foo_OR_bar
#0 foo True
#1 bar True
#2 baz False
#3 foobar True
#4 boo False
#To slice by that Boolean Series
df1 = df.loc[df['col1'].str.contains('|'.join(words))]
如果您加入逻辑是并且,那么我们可以使用 np.logical_and.reduce
结合列表理解来使事情紧凑。
If your joining logic is and then we can use np.logical_and.reduce
with a list comprehension to keep things compact.
df['foo_AND_bar'] = np.logical_and.reduce([df.col1.str.contains(w) for w in words])
# col1 foo_OR_bar foo_AND_bar
#0 foo True False
#1 bar True False
#2 baz False False
#3 foobar True True
#4 boo False False
这篇关于将关键字(字符串)与Pandas Dataframe匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!