精确匹配单词 [英] Exact match for words

查看:63
本文介绍了精确匹配单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果一个句子包含我正在寻找的单词之一,我想使用匹配的正则表达式.

I would like to use regular expression that matches if a sentence contains one of the words that I am looking for.

所有这些现在都匹配,这是不正确的.我对 words 中的所有单词都使用了 " "(例如 " 7 "),但是这次如果单词位于字符串的末尾,则它不匹配.

All of these are matching now which is not correct. I used " " for all words in words (like " seven ") but this time it doesn't match if a word is at the end of the string.

words = ('seven', 'eight')
regex = re.compile('|'.join(words))
print regex.search('aaaaaasd seven asdfadsf')   #1 - should match
print regex.search('AAAsevenAAA')               #2 - shouldn't match
print regex.search('AAA eightaaa')              #3 - shouldn't match
print regex.search('eight aaa')                 #4 - should match
print regex.search('aaaa eight')                #5 - should match

如果匹配的单词是单词的子字符串之一(如上面的 #2 和 #3),我怎样才能使我的正则表达式不匹配?

How can I make that my regular expression doesn't match if matching word is one of the words' substring (like #2 and #3 above)?

推荐答案

正如@CasimiretHippolyte 指出的那样,你想添加 词边界.如果您不想为列表中的每个单词手动执行此操作,则需要修改已编译的正则表达式.

As @CasimiretHippolyte pointed out you want to add word boundaries. If you don't want to manually do this for each word in your list, you need to modify your compiled regular expression.

regex = re.compile(r'\b(?:%s)\b' % '|'.join(words))

注意:如果您的正则表达式中有转义序列,最好使用原始字符串表示法.通过使用非捕获 (?:...) 组,这允许将您的单词与放置在它们周围的单词边界进行分组,否则它会在开头和结尾放置一个边界.

Note: If you have escape sequences in your regex, it's best to use raw string notation. By using a non-capturing (?:...) group, this allows your words to be grouped with word boundaries placed around them, otherwise it places a boundary at the very beginning and the very end.

Ideone 演示

这篇关于精确匹配单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆