提前在凯特(Kate)寻找图案 [英] lookahead in kate for patterns

查看:96
本文介绍了提前在凯特(Kate)寻找图案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为法律书籍汇编案件表.我已经将其转换为HTML,以便可以使用标签进行搜索和替换操作,而我目前在Kate工作.文本中引用了案例名称,这些案例的引用都在脚注中,例如

I'm working on compiling a table of cases for a legal book. I've converted it to HTML so I can use the tags for search and replace operations, and I'm currently working in Kate. The text refers to the names of cases and the citations for the cases are in the footnotes, e.g.

<i>Smith v Jones</i>127 ......... [other stuff including newline characters].......</br>127 (1937) 173 ER 406;

我已经可以使用以下方法在凯特(Kate)中进行超前工作:

I've been able to get lookahead working in Kate, using:

<i>.*</i>([0-9]{1,4}) .+<br/>\1 .*<br/>

...但是我遇到了贪婪问题.

...but I've run into greediness problems.

文本很乱,所以我真的需要逐步找到匹配项,而不是依靠批处理过程.

The text is a mess, so I really need to find matches step by step rather than relying on a batch process.

是否存在同时支持先行和非贪婪运算符的Linux(或Windows)文本编辑器,还是我必须尝试grep或sed?

Is there a Linux (or Windows) text editor that supports both lookahead AND non-greedy operators, or am I going to have to try grep or sed?

推荐答案

我不熟悉Kate,但似乎使用

I'm not familiar with Kate, but it seems to use QRegExp, which is incompatible with other Perl-like regex flavors in many important ways. For example, most flavors allow you make individual quantifiers non-greedy by appending a question mark (e.g. .* => .+?), but in QRegExp you can only make them all greedy or all non-greedy. What's worse, it looks like Kate doesn't even let you do that--via a Non-Greedy checkbox, for example.

但是最好还是不要一直依赖非贪婪的量词.一方面,正如许多人所说,它们不能保证最短的比赛.在不太困难的时候,您应该养成更具体地确定应该匹配和不应该匹配的内容的习惯.例如,如果您要匹配的部分中没有除示例字符串中的标记之外的任何标记,则可以执行以下操作:

But it's best not to rely on non-greedy quantifiers all time anyway. For one thing, they don't guarantee the shortest possible match, as many people say. You should get in the habit of being more specific about what should and should not be matched, when that's not too difficult. For example, if the section you want to match doesn't contain any tags other than the ones in your sample string, you can do this:

<i>[^<]*</i>(\d+)\b[^<]+<br/>\1\b[^<]*<br/>

使用[^<]*代替.*的优点是,它将永远不会尝试在下一个<之后匹配任何内容. .*首先总是会抓住文档的其余部分,只是几乎回溯到起点.非贪心版本.*?最初将仅与下一个<匹配,但是如果以后匹配尝试失败,它将继续消耗<并最终消耗整个文档.

The advantage of using [^<]* instead of .* is that it will never try to match anything after the next <. .* will always grab the rest of the document at first, only to backtrack almost all the way to the starting point. The non-greedy version, .*?, will initially match only to the next <, but if the match attempt fails later on it will go ahead and consume the < and beyond, eventually to consume the whole document.

如果还有 个其他标签,则可以改用[^<]*(<(?!br/>)[^<]*)*.它将消耗所有不是<<的字符,如果它不是<br/>标记的开头.

If there can be other tags, you can use [^<]*(<(?!br/>)[^<]*)* instead. It will consume any characters that are not <, or < if it's not the beginning of a <br/> tag.

<i>[^<]*</i>(\d+)\b[^<]*(<(?!br/>)[^<]*)*<br/>\1\b[^<]*(<(?!br/>)[^<]*)*<br/>

顺便说一句,您所说的前瞻(我假设您的意思是\1)实际上是后向引用.我的正则表达式中的(?!br/>)是前瞻的示例-在这种情况下,是前瞻. Kate/QRegExp文档声称支持先行但不捕获组-例如(?:...)-没有,这就是为什么在最后一个正则表达式中使用了所有捕获组的原因.

By the way, what you're calling a lookahead (I'm assuming you mean \1) is really a backreference. The (?!br/>) in my regex is an example of lookaheads--in this case a negative lookahead. The Kate/QRegExp docs claim that lookaheads are supported but non-capturing groups-- e.g. (?:...)--aren't, which is why used all capturing groups in that last regex.

如果您可以选择切换到其他编辑器,强烈建议您这样做.我最喜欢的是 EditPad Pro ;它具有我在编辑器中见过的最好的regex支持.

If you have the option of switching to a different editor, I strongly recommend that you do so. My favorite is EditPad Pro; it has the best regex support I've ever seen in an editor.

这篇关于提前在凯特(Kate)寻找图案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆