Python正则表达式搜索句子中的单词 [英] Python regular expression to search for words in a sentence

查看:60
本文介绍了Python正则表达式搜索句子中的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍在学习 Python 广告正则表达式的诀窍,我需要一些帮助!我需要一个可以在句子中搜索特定单词的正则表达式.我设法创建了一个模式来搜索单个单词,但如何检索我需要查找的其他单词?重新模式看起来如何做到这一点?

<预><代码>>>>问题=总共有30名员工?">>>re_pattern = r'\btotal.*?\b'>>>m = re.findall(re_pattern, 问题)['全部的']

它必须寻找total"和staff"这两个词谢谢迈克

解决方案

使用联合运算符 | 搜索您需要查找的所有单词:

在[20]中:re_pattern = r'\b(?:total|staff)\b'在 [21]: re.findall(re_pattern, question)出[21]:['总计','员工']

这与您上面的示例最为匹配.但是,这种方法仅在没有其他字符被添加或附加到单词时才有效.这种情况通常出现在主要和从句的末尾,其中逗号、点、感叹号或问号附加到子句的最后一个词.

例如,在问题您的员工中有多少人?上面的方法不会找到单词员工,因为末尾没有单词边界员工.取而代之的是一个问号.但是,如果您在上面的正则表达式末尾省略第二个 \b,该表达式将错误地检测子字符串中的单词,例如 totaltotallytotalities.

实现您想要的最佳方法是先提取句子中的所有字母数字字符,然后在此列表中搜索您需要查找的单词:

在 [51]: def find_all_words(words, sentence):....: all_words = re.findall(r'\w+', 句子)....: words_found = []....:逐字逐句:....: 如果单词在 all_words:....: words_found.append(word)....:返回 words_found在[52]中:print find_all_words(['total', 'staff'], '30中的员工总数?')['总计','员工']在 [53] 中:打印 find_all_words(['total', 'staff'], '我的员工完全超负荷工作.')['职员']

Im still learning the ropes with Python ad regular expressions and I need some help please! I am in need of a regular expression that can search a sentence for specific words. I have managed to create a pattern to search for a single word but how do i retrieve the other words i need to find? How would the re pattern look like to do this?

>>> question = "the total number of staff in 30?"
>>> re_pattern = r'\btotal.*?\b'
>>> m = re.findall(re_pattern, question)
['total']

It must look for the words "total" and "staff" Thanks Mike

解决方案

Use the union operator | to search for all the words you need to find:

In [20]: re_pattern = r'\b(?:total|staff)\b'

In [21]: re.findall(re_pattern, question)
Out[21]: ['total', 'staff']

This matches your example above most closely. However, this approach only works if there are no other characters which have been prepended or appended to a word. This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause.

For example, in the question How many people are in your staff? the approach above wouldn't find the word staff because there is no word boundary at the end of staff. Instead, there is a question mark. But if you leave out the second \b at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities.

The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find:

In [51]: def find_all_words(words, sentence):
....:     all_words = re.findall(r'\w+', sentence)
....:     words_found = []
....:     for word in words:
....:         if word in all_words:
....:             words_found.append(word)
....:     return words_found

In [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')
['total', 'staff'] 

In [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')
['staff']

这篇关于Python正则表达式搜索句子中的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆