使用正则表达式匹配包含特定字符串出现的行后面的特定行数 [英] Use regex to match certain number of lines that follow the line containing the occurrence of a specific string

查看:21
本文介绍了使用正则表达式匹配包含特定字符串出现的行后面的特定行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 InDesign 工作,格式化大量文本.这是文本示例.

I am working in InDesign, formatting large quantities of text. Here is a sample of the text.

新!办公室运营证书(3 部分)
办公室运营
管理者的网络安全
在工作场所拥抱可持续性
3D 打印简介
创客技术简介:新的商店课程

NEW! Certificate in Office Operations (3 parts)
Office Operations
Cyber Security for Managers
Embracing Sustainability in the Workplace
Intro to 3D Printing
Intro to Maker Tech: The New Shop Class

我需要能够匹配包含字符串(3 parts)"的行后面的三行.

I need to be able to match the three lines that follow a line containing the string "(3 parts)".

我的想法是尝试像这样的积极回顾:

My thought would be to try a positive look-behind like this:

(?<=\(3 parts\)$)^.*$

但它不匹配任何东西.

推荐答案

lookbehind 部分是正确的,但是使用了符号 ^ (Begin Paragraph) 和 $(End Paragraph) 仅限于匹配位置——而不是实际的硬回车"字符.这就是你的表达式失败的原因:默认情况下,.匹配所有"字符不匹配返回.所以这使得第一个测试 (?<=\(3 parts\)$)^. 失败:既不是 $ 在lookbehind 也不是 ^ 消耗了返回值,并且根据此默认规则,以下 . 也不匹配.

The lookbehind part is correct, but the use of the symbols ^ (Begin Paragraph) and $ (End Paragraph) are restricted to matching the position only – not the actual 'Hard return' characters. That is the reason your expression fails: by default, the . "match all" character does not match returns. So that makes the first test (?<=\(3 parts\)$)^. fail: neither the $ in the lookbehind nor the ^ consumed the return, and the following . does not match it either, per this default rule.

可以将 GREP 置于单行模式 - 一个有趣的描述可能会让你站错脚.从 GREP 的角度来看,它也允许 . 匹配返回;因此整个运行文本,硬回车等等,都可以被认为是单(长)行".其代码是 (?s),通常放在表达式的最前面.

It is possible to put GREP into Single Line mode – a funny description that may put you on the wrong foot. From the perspective of GREP, it allows . to match a return as well; and so an entire running text, hard returns and all, can be considered a "single (long) line". The code for that is (?s), and is typically put at the very front of your expression.

这本身并不足以让它发挥作用,因为

That in itself is not enough to make it work, because

(?s)(?<=\(3 parts\)$)^.

仍然期望在 $^ 之间返回(否则任何一个都会出错!).无论如何,这不是匹配一定数量的段落的好方法.调整后的表情

still expects a return between the $ and ^ (otherwise either one would be wrong!). Anyway, it's not a good way to match a certain number of paragraphs. The adjusted expression

(?s)(?<=\(3 parts\)$).^.*

在使用硬返回时正常工作,但也选择所有内容直到最后.

works correctly in consuming the hard returns, but selects everything up to the end as well.

我提出了一个更简单的方法:如果您想获取一定数量的硬返回,只需将它们立即包含在您的表达式中——它们的 GREP 代码是 \r.

I propose a much simpler approach: if you want to grab a certain number of hard returns, just include them right away in your expression – their GREP code is \r.

这会导致以下情况:

(?<=\(3 parts\)\r)(.*\r){3}

lookbehind 是你已经得到的,加上一个 return 来结束那个特定的行(它在 lookbehind 中,因为你不想抓住那个 return),然后是三个重复的序列来抓住一个整行,.*\r.

where the lookbehind is what you already got, plus a return to end that particular line (and it's in the lookbehind because you don't want to grab that return as well), followed by three repetitions of a sequence to grab an entire line, .*\r.

这篇关于使用正则表达式匹配包含特定字符串出现的行后面的特定行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆