如何在大 pandas 中测试字符串是否包含列表中的子字符串之一? [英] How to test if a string contains one of the substrings in a list, in pandas?

查看：97 发布时间：2020/5/6 9:19:04 python string pandas dataframe match

本文介绍了如何在大 pandas 中测试字符串是否包含列表中的子字符串之一?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有任何功能等同于df.isin()和df[col].str.contains()的组合?

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()?

例如，说我有系列 s = pd.Series(['cat','hat','dog','fog','pet'])，我想查找s包含['og', 'at']中任何一个的所有地方，我想获取除"pet"以外的所有内容.

For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but 'pet'.

我有一个解决方案，但这很不雅致:

I have a solution, but it's rather inelegant:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

有更好的方法吗?

推荐答案

一种选择就是使用正则表达式|字符尝试匹配系列s中单词中的每个子字符串(仍然使用str.contains).

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

您可以通过将searchfor中的单词与|连接起来来构造正则表达式:

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

正如@AndyHayden在下面的注释中指出的那样，请注意您的子字符串是否具有特殊字符(例如$和^)，这些字符要按字面值进行匹配.这些字符在正则表达式的上下文中具有特定的含义，并且会影响匹配.

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

通过使用re.escape转义非字母数字字符，可以使子字符串列表更安全:

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

与str.contains一起使用时，此新列表中带有的字符串将逐字匹配每个字符.

The strings with in this new list will match each character literally when used with str.contains.

这篇关于如何在大 pandas 中测试字符串是否包含列表中的子字符串之一?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在大 pandas 中测试字符串是否包含列表中的子字符串之一? [英] How to test if a string contains one of the substrings in a list, in pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在大 pandas 中测试字符串是否包含列表中的子字符串之一? [英] How to test if a string contains one of the substrings in a list, in pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭