使用ReGex匹配表达式,Python [英] Matching an expression using ReGex ,Python
问题描述
我有很多句子,尽管我会创建一个可以对每个句子单独操作的函数.所以输入只是一个字符串.我的主要目标是提取像"near blue meadows"
这样的介词之后的单词,我希望提取blue meadows
.
我所有的介词都放在一个文本文件中.它工作正常,但我想所用的正则表达式存在问题.这是我的代码:
汇入
I have many sentences , though i'd create a function that would operate on each sentence individually. so the input is just a string. My main objective is to extract the words that follow prepositions like in "near blue meadows"
i'd want blue meadows
to be extracted.
I have all my prepositions in a text file. it works fine but i guess there's a problem in the regex used . here's my code:
import re
with open("Input.txt") as f:
words = "|".join(line.rstrip() for line in f)
pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"
print(pattern.search(text3).group())
这将返回:
AttributeError Traceback (most recent call last)
<ipython-input-83-be0cdffb436b> in <module>()
5 pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
6 text3 = ""
----> 7 print(pattern.search(text3).group())
AttributeError: 'NoneType' object has no attribute 'group
主要问题是使用正则表达式,我的预期输出是"hennur警察",即near后2个字.在我的代码中,我使用了({})
来匹配准备列表,其中\s
之后是空格,(\d+\w+|\w+)
之后是19th或hennur之类的单词,\s\w+
之后是一个空格和一个单词.我的正则表达式无法匹配,因此出现了None
错误.
为什么它不起作用?
The main problem is with regex , my expected output is "hennur police" i.e 2 words after near . In my code I have used ({})
to match from the list of preps, \s
followed by space , (\d+\w+|\w+)
followed by words like 19th or hennur , \s\w+
followed by a space and a word. My regex fails to match , hence the None
error.
Why is it not working?
Input.txt
文件的内容:
The content of the Input.txt
file:
['near','nr','opp','opposite','behind','towards','above','off']
预期输出:
hennur police
推荐答案
该文件包含Python列表文字.使用 ast.literal
来解析文字.
The file contains Python list literal. Use ast.literal
to parse the literal.
>>> import ast
>>> ast.literal_eval("['near','nr','opp','opposite','behind','towards','above','off']")
['near', 'nr', 'opp', 'opposite', 'behind', 'towards', 'above', 'off']
import ast
import re
with open("Input.txt") as f:
words = '|'.join(ast.literal_eval(f.read()))
pattern = re.compile('(?:{})\s(\d*\w+\s\w+)'.format(words))
text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"
# If there could be multiple matches, use `findall` or `finditer`
# `findall` returns a list of list if there's capturing group instead of
# entire matched string.
for place in pattern.findall(text3):
print(place)
# If you want to get only the first match, use `search`.
# You need to use `group(1)` to get only group 1.
print pattern.search(text3).group(1)
输出(第一行以for
循环打印,第二行来自search(..).group(1)
):
output (The first line is printed in for
loop, the second one come from search(..).group(1)
):
hennur police
hennur police
注意,您需要 re.escape
每个单词,如果该单词中有任何特殊字符在正则表达式中具有特殊含义.
NOTE you need to re.escape
each word if there's any special character in the word that has special meaning in regular expression.
这篇关于使用ReGex匹配表达式,Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!