将关键字(字符串)与Pandas Dataframe匹配 [英] Matching keywords (strings) with a Pandas Dataframe

查看:340
本文介绍了将关键字(字符串)与Pandas Dataframe匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要与某些关键字匹配的数据框(我想检测包含这些关键字的行)
我设法通过这种方式获得了这份工作。但是我不知道是否有更好的方法可以知道我可能有多达10个或20个不同的关键字。

I have a Dataframe that I want to match against some keywords (I want to detect the rows that contain those keywords) I managed to get the job this way. But I wonder if there's a better way to do it knowing that I might have up to 10 or 20 different keywords.

df1 = df[df['column1'].str.contains("keyword1") | df['column1'].str.contains('keyword2')]

(我是初学者,请使其尽可能简单)

(I'm a beginner please keep it as simple as possible)

推荐答案

对于逻辑,您可以创建单个模式通过将单词与 | 结合在一起。将您的10-20个单词存储在列表中,然后'|'.join(that_list)

For or logic you can create a single pattern by joining the words with |. Store your 10-20 words in a list then '|'.join(that_list).

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ['foo', 'bar', 'baz', 'foobar', 'boo']})
words = ['foo', 'bar']

df['foo_OR_bar'] = df['col1'].str.contains('|'.join(words))

#     col1  foo_OR_bar
#0     foo        True
#1     bar        True
#2     baz       False
#3  foobar        True
#4     boo       False

#To slice by that Boolean Series
df1 = df.loc[df['col1'].str.contains('|'.join(words))]






如果您加入逻辑是并且,那么我们可以使用 np.logical_and.reduce 结合列表理解来使事情紧凑。


If your joining logic is and then we can use np.logical_and.reduce with a list comprehension to keep things compact.

df['foo_AND_bar'] = np.logical_and.reduce([df.col1.str.contains(w) for w in words])

#     col1  foo_OR_bar  foo_AND_bar
#0     foo        True        False
#1     bar        True        False
#2     baz       False        False
#3  foobar        True         True
#4     boo       False        False

这篇关于将关键字(字符串)与Pandas Dataframe匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆