正则表达式非重复二元组 [英] Regex Non-Duplicate Bigrams

查看:49
本文介绍了正则表达式非重复二元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个 PCRE 正则表达式来创建类似于这个 问题,但没有重复的单词.

I want a PCRE regex to create bigram pairings similar to this question, but without duplicates words.

Full Match: apple orange plum
Group 1: apple orange
Group 2: orange plum

我最接近的是这个,但第二组中没有捕获橙色".

The closest I’ve gotten to it is this, but ‘orange’ isn’t captured in the second group.

(\b.+\b)(\g<1>)\b

推荐答案

您正在寻找这个:

/(?=(\b\w+\s+\w+))/g

这是一个快速的 perl 单行代码来演示它:

Here's a quick perl one-liner to demonstrate it:

$ perl -e 'while ("apple orange plum" =~ /(?=(\b\w+\s+\w+))/g) { print "$1\n" }'
apple orange
orange plum

这使用零宽度 lookahead (?=...) 在捕获组周围,以确保我们可以两次读取orange"这个词.

This uses a zero-width lookahead (?=…) around the capture group to ensure we can read the word "orange" twice.

如果我们使用 /(\b\w+\s+\w+)/g 代替,我们会得到apple orange"而不是第二个匹配,因为从左到右的处理正则表达式已经通过了orange"这个词

If we used /(\b\w+\s+\w+)/g instead, we'd get "apple orange" but not the second match because the left-to-right processing of the regular expression would have already passed over the word "orange"

如果我们省略单词break \b,正则表达式解释器会给我们apple orange",然后是pple orange"、ple orange"等等......包括orange splash"稍后,还有范围李子"到e李子",因为它们都满足该标准.

If we omit the word break \b, the regex interpreter would give us "apple orange" and then "pple orange", "ple orange", etc ... including "orange plum" later on, but also "range plum" through "e plum" since those all satisfy that criteria.

完整解释我在 Regex101 的原始正则表达式

这篇关于正则表达式非重复二元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆