可伸缩的解决方案，包含带有pandas中字符串列表的str.contains [英] Scalable solution for str.contains with list of strings in pandas

查看：67 发布时间：2020/5/23 21:43:48 python regex string pandas dataframe

本文介绍了可伸缩的解决方案，包含带有pandas中字符串列表的str.contains的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在解析包含字符串对象行的pandas数据框df1.我有一个关键字参考列表，需要删除df1中包含参考列表中任何单词的每一行.

I am parsing a pandas dataframe df1 containing string object rows. I have a reference list of keywords and need to delete every row in df1 containing any word from the reference list.

当前，我是这样的:

reference_list: ["words", "to", "remove"]
df1 = df1[~df1[0].str.contains(r"words")]
df1 = df1[~df1[0].str.contains(r"to")]
df1 = df1[~df1[0].str.contains(r"remove")]

不能将其扩展到数千个单词.但是，当我这样做时:

df1 = df1[~df1[0].str.contains(reference_word for reference_word in reference_list)]

我产生错误第一个参数必须是字符串或编译模式.

以下此解决方案，我尝试了:

Following this solution, I tried:

reference_list: "words|to|remove" 
df1 = df1[~df1[0].str.contains(reference_list)]

不会引发异常，但不会解析所有单词.

Which doesn't raise an exception but doesn't parse all words eather.

如何有效地使用带有单词列表的str.contains?

推荐答案

对于可伸缩解决方案，请执行以下操作-

For a scalable solution, do the following -

通过正则表达式或管道|
将此内容传递给str.contains
使用结果过滤df1

join the contents of words by the regex OR pipe |
pass this to str.contains
use the result to filter df1

要为第0 列建立索引，请不要使用df1[0](因为这可能被认为是模棱两可的).最好使用loc或iloc(请参见下文).

To index the 0^th column, don't use df1[0] (as this might be considered ambiguous). It would be better to use loc or iloc (see below).

words = ["words", "to", "remove"]
mask = df1.iloc[:, 0].str.contains(r'\b(?:{})\b'.format('|'.join(words)))
df1 = df1[~mask]

注意:如果words是系列，这也将起作用.

Note: This will also work if words is a Series.

或者，如果您的第0 列仅是单词(不是句子)列，则可以使用df.isin，它应该更快-

Alternatively, if your 0^th column is a column of words only (not sentences), then you can use df.isin, which should be faster -

df1 = df1[~df1.iloc[:, 0].isin(words)]

这篇关于可伸缩的解决方案，包含带有pandas中字符串列表的str.contains的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

可伸缩的解决方案，包含带有pandas中字符串列表的str.contains [英] Scalable solution for str.contains with list of strings in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

可伸缩的解决方案，包含带有pandas中字符串列表的str.contains [英] Scalable solution for str.contains with list of strings in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭