正则表达式中的前瞻 [英] Lookahead in Regex

查看:163
本文介绍了正则表达式中的前瞻的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用正则表达式从包含几篇文章的文件中提取场所.我知道场地以For/From开头,后跟日期,以星期几或作者的名字开头(如果缺少日期),我编写了以下正则表达式以匹配场地,但是,直到作者的名字,这意味着如果该文章有日期,日期也将出现在场地中.

I am trying to extract venue from a file which contains several articles using regex. I know that the venue starts with either For/From and is followed by date which starts with a day of the week or author's name if the date is missing, I wrote the following regex to match the venue, however it always matches everything till the author's name which means the date also comes in the venue if that article has a date.

"""((?<=\n)(?:(?:\bFrom\b)|(?:\bFor\b)).*?(?=(?:(?:Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)|(?:[A-Z]+))))""".r

如果遇到我的代码为什么不匹配日期,而是继续匹配作者姓名[A-Z].

Why is my code not matching the days if it is encountered but rather goes ahead to match [A-Z] which is the author's name.

输入:国家间敌对行动的后果

Input: "The Consequences of Hostilities Between the States

来自纽约邮包.

1787年11月20日,星期二.

Tuesday, November 20, 1787.

汉密尔顿

致纽约州人民:"

行"1787年11月20日,星期二".是可选的,可能不会在所有文章中都出现.我希望输出为来自纽约邮包". 对于没有日期的文章,我得到了正确的输出,但是从纽约邮包"中得到了输出.

The line "Tuesday, November 20, 1787." is optional and may not occur in all articles. I want the output to be "From the New York Packet." I am getting the correct output for articles that do not have a date, however I am getting the output "From the New York Packet.

1787年11月20日,星期二."以了解包含日期的文章.

Tuesday, November 20, 1787." for articles that contain the date.

推荐答案

您只需要捕获以For或From开头的整行,因此您可以简单地使用以下代码:

You only need to capture the entire line that starts with For or From, so you can simply use this:

^(For|From).*$

^和$将匹配项固定到行的开头和结尾,而.*匹配之间的所有内容.

The ^ and $ anchor the match to the start and end of the line, and the .* matches everything inbetween.

在这里,尝试一下,并附上您喜欢的任何示例.

Here, try it out with whatever examples you like.

如果这需要更复杂,我将更新答案.

If this needs to be more complicated, I'll update my answer.

这篇关于正则表达式中的前瞻的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆