REGEX捕获带引号的句子 [英] REGEX to capture sentences with quotes

查看:69
本文介绍了REGEX捕获带引号的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在整理正则表达式以匹配引号和句子时遇到麻烦.这是我要满足的(简化的)规格:

I am having trouble putting together a regex to match quotes and sentences. Here are the (simplified) specs I am trying to meet:

  • 句子是一串字符,后跟一个标点符号(一个点,以使内容保持简单)或换行符.

  • A sentence is a chain of characters followed by a punctuation mark (a dot, to keep things simple) or a newline.

引号是两个"之间的字符链.

A quote is a chain of characters between two ".

每个句子应重新匹配.

一个句子可以包含引号,而引号可以包含句子.只有引号中的最后一个句子才能结束捕获.

A sentence can contain quotes, and quotes can contain sentences. Only the last sentence in a quote should end the capture.

到目前为止,我已经提出了:\s*((?:("[^"]*")|[^.\n])*\.+"?)\s*

So far I have come up with this: \s*((?:("[^"]*")|[^.\n])*\.+"?)\s*

测试用例: REGEX101

如您所见,我无法正确地将引号和句子分开.例如:

As you can see I can't properly separate quotes from sentences. For example:

§2:"Your lordship," Mya informed Lord Robert, "Lady Waynwood’s banners have been seen an hour down the road. She will be here soon, with your cousin Harry. Will you want to greet them"应该是完全匹配,但是正则表达式给了我三个并捕获了下一段.

§2: "Your lordship," Mya informed Lord Robert, "Lady Waynwood’s banners have been seen an hour down the road. She will be here soon, with your cousin Harry. Will you want to greet them" Should be a full match, but the regex gives me three and captures the next paragraph.

§3:"They were invited," she said uncertainly, "for the tourney. I don’t..."应该完全匹配停止,但是正则表达式继续捕获Alayne closed her book.

§3: "They were invited," she said uncertainly, "for the tourney. I don’t..." Should stop as a full match , but the regex goes on to capture Alayne closed her book.

我不知道出了什么问题,我们将不胜感激.

I can't figure out what is going wrong, any help would be very much appreciated.

期望的输出

推荐答案

REGEX101

((?![.\n\s])[^.\n"]*(?:"[^\n"]*[^\n".]"[^.\n"]*)*(?:"[^"\n]+\."|\.|(?=\n)))

拆分:

  • (?![.\n\s])-首先检查我们是否以有效字符(不是空格或句子的结尾)开头.
  • [^.\n"]*-然后匹配引号中所有不包含句子终止符的文本.
  • (?:"[^\n"]*[^\n".]"[^.\n"]*)-然后(在非捕获组中)匹配至少包含一个字符且不包含换行符且不以句子终止符结尾的引号-后跟零个或多个字符不在引号中并且不包含句子终止符.
  • *-先前的非捕获组可以重复零(以便可以有不带引号的句子)或更多次.
  • (?:"[^"\n]+\."|\.|(?=\n))-最后,添加一个以句号结尾的引号或句子结尾处的句号,或者检查我们是否以换行符结尾.
  • (?![.\n\s]) - First check we are starting with a valid character (not whitespace or the end of a sentence.
  • [^.\n"]* - Then match any text not surrounded in quotes which does not contain a sentence terminator.
  • (?:"[^\n"]*[^\n".]"[^.\n"]*) - Then match (in a non-capturing group) a quote that contains at least one character and does not contain a newline and does not end the quote with a sentence terminator - followed by zero-or-more characters which are not in a quote and do not contain a sentence terminator.
  • * - the previous non-capturing group can be repeated zero (so that there can be sentences without quotes) -or-more times.
  • (?:"[^"\n]+\."|\.|(?=\n)) - finally, include either a quote which terminates with a full stop or the full stop at the end of the sentence or check that we are ending with a newline.

这篇关于REGEX捕获带引号的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆