正则表达式查找文本的所有句子? [英] Regex to find all sentences of text?
问题描述
我一直在尝试在 python 中自学正则表达式,我决定打印出文本的所有句子.过去 3 个小时我一直在修改正则表达式,但无济于事.
I have been trying to teach myself Regexes in python and I decided to print out all the sentences of a text. I have been tinkering with the regular expressions for the past 3 hours to no avail.
我只是尝试了以下操作,但什么也做不了.
I just tried the following but couldn't do anything.
p = open('anan.txt')
process = p.read()
regexMatch = re.findall('^[A-Z].+\s+[.!?]$',process,re.I)
print regexMatch
p.close()
我的输入文件是这样的:
My input file is like this:
OMG is this a question ! Is this a sentence ? My.
name is.
这不会打印输出.但是当我删除My. name is."时,它会打印 OMG is this a question 和 Is this a sentence together,就好像它只读取第一行一样.
This prints no outputs. But when I remove "My. name is.", it prints OMG is this a question and Is this a sentence together as if it only reads the first line.
可以在文本文件中找到所有句子的正则表达式的最佳解决方案是什么 - 无论句子是否换行左右 - 并且还可以阅读整个文本?谢谢.
What is the best solution of regex that can find all sentences in a text file - regardless if the sentence carries to new line or so - and also reads the entire text? Thanks.
推荐答案
类似这样的方法:
## pattern: Upercase, then anything that is not in (.!?), then one of them
>>> pat = re.compile(r'([A-Z][^\.!?]*[\.!?])', re.M)
>>> pat.findall('OMG is this a question ! Is this a sentence ? My. name is.')
['OMG is this a question !', 'Is this a sentence ?', 'My.']
注意 name 是怎样的.
不在结果中,因为它不是以大写字母开头.
Notice how name is.
is not in the result because it does not start with a uppercase letter.
您的问题来自于 ^$
锚点的使用,它们适用于整个文本.
Your problem comes from the use of the ^$
anchors, they work on the whole text.
这篇关于正则表达式查找文本的所有句子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!