在字符串中查找子字符串但仅当整个单词? [英] Find substring in string but only if whole words?
问题描述
在 Python 中查找另一个字符串中的字符串的优雅方法是什么,但前提是子字符串在整个单词中,而不是单词的一部分?
What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?
也许一个例子可以说明我的意思:
Perhaps an example will demonstrate what I mean:
string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2) # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2) # this should be False
我怎样才能最好地编写一个名为 string_found 的函数来满足我的需要?我想也许我可以用这样的东西来捏造它:
How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:
def string_found(string1, string2):
if string2.find(string1 + " "):
return True
return False
但这感觉不是很优雅,如果它在string2的末尾也不会匹配string1.也许我需要一个正则表达式?(argh 正则表达式恐惧)
But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)
推荐答案
您可以使用正则表达式 和单词边界特殊字符 \b
(由我突出显示):
You can use regular expressions and the word boundary special character \b
(highlight by me):
匹配空字符串,但只在单词的开头或结尾.单词被定义为一系列字母数字或下划线字符,因此单词的结尾由空格或非字母数字、非下划线字符表示.请注意,\b
被定义为 \w
和 \W
之间的边界,因此被视为字母数字的精确字符集取决于UNICODE
和 LOCALE
标志的值.在字符范围内,\b
表示退格字符,以便与 Python 的字符串文字兼容.
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that
\b
is defined as the boundary between\w
and\W
, so the precise set of characters deemed to be alphanumeric depends on the values of theUNICODE
andLOCALE
flags. Inside a character range,\b
represents the backspace character, for compatibility with Python’s string literals.
def string_found(string1, string2):
if re.search(r"\b" + re.escape(string1) + r"\b", string2):
return True
return False
如果单词边界对您来说只是空格,您还可以在字符串前添加和附加空格:
If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:
def string_found(string1, string2):
string1 = " " + string1.strip() + " "
string2 = " " + string2.strip() + " "
return string2.find(string1)
这篇关于在字符串中查找子字符串但仅当整个单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!