通过正则表达式选择两个连续的单词 [英] Select two consecutive words by regular expression

查看:37
本文介绍了通过正则表达式选择两个连续的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因为,我是正则表达式的新手;我想做一个正则表达式来选择两个连续的单词.

Since, I'm new in regular expressions; I want to make a regular expression to select two consecutive words.

例如,当我说出这句话时:大家好,#RegularExpression 很烂!"

For example when i give this phrase: "Hello people #RegularExpression sucks!"

它必须返回这两个词:

-大家好

-people #RegularExpression

-people #RegularExpression

-#RegularExpression 很烂!

-#RegularExpression sucks!

我试过这个 /\w\s\w/i 但它没有用 :(

I tried this /\w\s\w/i but it did not work :(

推荐答案

$s = "Hello people #RegularExpression sucks!";
preg_match_all('~(?=(\S+\s+\S+))\S+\s+~', $s, $matches);
print_r($matches[1]);

输出:

Array
(
    [0] => Hello people
    [1] => people #RegularExpression
    [2] => #RegularExpression sucks!
)

说明:

\S+ 匹配一个或多个非空白字符.你的 \w 不正确有两个原因:它只匹配一个字符;并且它只匹配一个所谓的单词字符(相当于[A-Za-z0-9_]).在这个测试用例中,没有必要将 + 添加到您的 \s 中,但是没有理由添加它,额外的空格确实可以有办法潜入现实世界中的文本.(但一定要添加+,而不是*;其中必须至少有一个空格字符.)

\S+ matches one or more non-whitespace characters. Your \w was incorrect for two reasons: it only only matches one character; and it only matches a so-called word character (equivalent to [A-Za-z0-9_]). Adding the + to your \s wasn't necessary in this test case, but there's no reason not to add it, and extra whitespace does have a way of sneaking into text in the real world. (But be sure and add +, not *; there must be at least one whitespace character in there.)

(?=...) 是一个正向预测一>.您可以使用它们来检查是否可能在当前匹配位置匹配封闭的子表达式,而无需推进匹配位置.然后,通常,您会继续匹配不同的子表达式,而不是先行.

(?=...) is a positive lookahead. You use them to check whether it's possible to match the enclosed subexpression at the current match position, without advancing the match position. Then, typically, you go ahead and match a different subexpression, not in a lookahead.

这里有一个棘手的地方:虽然前瞻子表达式匹配的字符没有被消耗,但是子表达式中的任何捕获组都照常工作.我的正则表达式中的前瞻,(?=(\S+\s+\S+)) 匹配并捕获下一个两个单词的序列.然后(假设前瞻成功)\S+\s+ 以正常方式匹配,为下一次尝试正确设置匹配位置.

Here's the tricky bit: Although the characters matched by the lookahead subexpression are not consumed, any capturing groups in the subexpression work as usual. The lookahead in my regex, (?=(\S+\s+\S+)) matches and captures the next two-word sequence. Then (assuming the lookahead succeeded) \S+\s+ matches in the normal way, setting the match position correctly for the next attempt.

这种技术应该适用于任何支持捕获组和前瞻的正则表达式.这包括 PHP 以及所有其他主要语言(Perl、JavaScript、.NET、Python、Java...).从一种语言到另一种语言,仅访问每个匹配中第一个捕获组的内容的技术千差万别,但 PHP 使用 $matches[1] 使其变得简单.

This technique should work in any regex flavor that supports capturing groups and lookaheads. That includes PHP as well as all the other major languages (Perl, JavaScript, .NET, Python, Java...). The technique for accessing only the contents of the first capturing group from each match varies wildly from one language to the next, but PHP makes it easy, with $matches[1].

这篇关于通过正则表达式选择两个连续的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆