如何在大 pandas 中测试字符串是否包含列表中的子字符串之一? [英] How to test if a string contains one of the substrings in a list, in pandas?

查看:97
本文介绍了如何在大 pandas 中测试字符串是否包含列表中的子字符串之一?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何功能等同于df.isin()df[col].str.contains()的组合?

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()?

例如,说我有系列 s = pd.Series(['cat','hat','dog','fog','pet']),我想查找s包含['og', 'at']中任何一个的所有地方,我想获取除"pet"以外的所有内容.

For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but 'pet'.

我有一个解决方案,但这很不雅致:

I have a solution, but it's rather inelegant:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

有更好的方法吗?

推荐答案

一种选择就是使用正则表达式|字符尝试匹配系列s中单词中的每个子字符串(仍然使用str.contains).

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

您可以通过将searchfor中的单词与|连接起来来构造正则表达式:

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

正如@AndyHayden在下面的注释中指出的那样,请注意您的子字符串是否具有特殊字符(例如$^),这些字符要按字面值进行匹配.这些字符在正则表达式的上下文中具有特定的含义,并且会影响匹配.

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

通过使用re.escape转义非字母数字字符,可以使子字符串列表更安全:

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

str.contains一起使用时,此新列表中带有的字符串将逐字匹配每个字符.

The strings with in this new list will match each character literally when used with str.contains.

这篇关于如何在大 pandas 中测试字符串是否包含列表中的子字符串之一?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆