通过查找字符串列中的确切单词(未组合)来过滤DataFrame [英] Filtering DataFrame by finding exact word (not combined) in a column of strings

查看：67 发布时间：2020/5/23 23:39:42 python regex string pandas dataframe

本文介绍了通过查找字符串列中的确切单词(未组合)来过滤DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的DataFrame有两列:

My DataFrame has two columns:

Name  Status
 a    I am Good
 b    Goodness!!!
 c    Good is what i feel
 d    Not Good-at-all

我想过滤状态"中包含字符串"Good"作为其确切单词的行，而不将其与任何其他单词或字符组合.

I want to filter rows in which Status has a string 'Good' as its exact word, not combined with any other words or characters.

所以输出将是:

Name  Status
a    i am Good
c    Good is what i feel

另外两行中有一个'Good'字符串，但与其他字符混合在一起，因此不应被提取.

Two other rows had a 'Good' string in it but mixed with other characters, so should not be picked up.

我尝试做:

d = df[df['Status'].str.contains('Good')]  # But all rows come up

我相信像(r'\bGood\b', Status)这样的正则表达式可以做到这一点，但这无法将其总结在一起.以及如何/在哪里可以将正则表达式完全适合DataFrame过滤器条件以实现此目的?以及如何实现startswith或endswith'良好'(精确单词搜索)?

I believe some regex like (r'\bGood\b', Status) will do that, but this is not able to sum it up together. And how/where exactly can I fit the regex in a DataFrame filter condition to achieve this? And how to achieve startswith or endswith 'Good' (exact word search)?

推荐答案

如果要定义精确"以表示没有其他字符(包括定义单词边界的标点符号\b)，则可以改为检查前导和尾随空格和/或开始/结束锚点:

If you're defining "exact" to mean no other characters (including punctuation which defines a word boundary \b), you could instead check for a leading and trailing space and/or beginning/end anchors:

>>> df[df['Status'].str.contains(r'(?:\s|^)Good(?:\s|$)')]
  Name               Status
0    a            I am Good
2    c  Good is what i feel

说明:

(?:\s|^)是一个非捕获组，正在寻找空格字符(\s)或字符串的开头(^).

(?:\s|^) is a non-capturing group looking for a space character (\s) or the beginning of the string (^).

Good是您要查找的单词.

(?:\s|$)是一个非捕获组，正在寻找空格字符(\s)或字符串结尾($).

(?:\s|$) is a non-capturing group looking for a space character (\s) or the end of the string ($).

这篇关于通过查找字符串列中的确切单词(未组合)来过滤DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过查找字符串列中的确切单词(未组合)来过滤DataFrame [英] Filtering DataFrame by finding exact word (not combined) in a column of strings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

通过查找字符串列中的确切单词(未组合)来过滤DataFrame [英] Filtering DataFrame by finding exact word (not combined) in a column of strings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭