在字符串中查找子字符串但仅当整个单词? [英] Find substring in string but only if whole words?

查看:49
本文介绍了在字符串中查找子字符串但仅当整个单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python 中查找另一个字符串中的字符串的优雅方法是什么,但前提是子字符串在整个单词中,而不是单词的一部分?

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?

也许一个例子可以说明我的意思:

Perhaps an example will demonstrate what I mean:

string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2)  # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2)  # this should be False

我怎样才能最好地编写一个名为 string_found 的函数来满足我的需要?我想也许我可以用这样的东西来捏造它:

How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:

def string_found(string1, string2):
   if string2.find(string1 + " "):
      return True
   return False

但这感觉不是很优雅,如果它在string2的末尾也不会匹配string1.也许我需要一个正则表达式?(argh 正则表达式恐惧)

But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)

推荐答案

您可以使用正则表达式 和单词边界特殊字符 \b(由我突出显示):

You can use regular expressions and the word boundary special character \b (highlight by me):

匹配空字符串,但只在单词的开头或结尾.单词被定义为一系列字母数字或下划线字符,因此单词的结尾由空格或非字母数字、非下划线字符表示.请注意,\b 被定义为 \w\W 之间的边界,因此被视为字母数字的精确字符集取决于UNICODELOCALE 标志的值.在字符范围内,\b 表示退格字符,以便与 Python 的字符串文字兼容.

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

def string_found(string1, string2):
   if re.search(r"\b" + re.escape(string1) + r"\b", string2):
      return True
   return False

演示

如果单词边界对您来说只是空格,您还可以在字符串前添加和附加空格:

If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:

def string_found(string1, string2):
   string1 = " " + string1.strip() + " "
   string2 = " " + string2.strip() + " "
   return string2.find(string1)

这篇关于在字符串中查找子字符串但仅当整个单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆