正则表达式:如何匹配任何字符串直到空格,或者直到标点符号后跟空格? [英] Regex: how to match any string until whitespace, or until punctuation followed by whitespace?

查看:51
本文介绍了正则表达式:如何匹配任何字符串直到空格,或者直到标点符号后跟空格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个正则表达式,它将在纯文本字符串中查找 URL,以便我可以用锚标记将它们包装起来.我知道有 表达式已经可用于此,但我想创建自己的,主要是因为我想知道它是如何工作的.

I'm trying to write a regular expression which will find URLs in a plain-text string, so that I can wrap them with anchor tags. I know there are expressions already available for this, but I want to create my own, mostly because I want to know how it works.

因为如果我的正则表达式失败,它不会破坏任何东西,所以我的计划是写一些相当简单的东西.到目前为止,这意味着:1) 匹配单词开头的www"或http" 2) 保持匹配直到单词结束.

Since it's not going to break anything if my regex fails, my plan is to write something fairly simple. So far that means: 1) match "www" or "http" at the start of a word 2) keep matching until the word ends.

我可以做到,AFAICT.我有这个:\b(http|www).?[^\s]+

I can do that, AFAICT. I have this: \b(http|www).?[^\s]+

适用于 foo www.example.com bar http://www.example.com

问题是,如果我给它 foo www.example.com, http://www.example.com 它认为逗号是 URL 的一部分.

The problem is that if I give it foo www.example.com, http://www.example.com it thinks that the comma is a part of the URL.

因此,如果我要使用一个表达式来执行此操作,我需要将...并在看到空格时停止"更改为...并在看到空格或空格前的标点符号时停止".这是我不知道该怎么做.

So, if I am to use one expression to do this, I need to change "...and stop when you see whitespace" to "...and stop when you see whitespace or a piece of punctuation right before whitespace". This is what I'm not sure how to do.

目前,我正在考虑的一个解决方案是添加另一个测试——匹配 URL,然后在下一行移动任何偷偷摸摸的标点符号.这只是不那么优雅.

At the moment, a solution I'm thinking of running with is just adding another test – matching the URL, and then on the next line moving any sneaky punctuation. This just isn't as elegant.

注意:我是用 PHP 编写的.

Note: I am writing this in PHP.

旁白:为什么在上面的表达式中用 \b 替换 \s 似乎不起作用?

Aside: why does replacing \s with \b in the expression above not seem to work?

预计到达时间:

谢谢大家!

根据 Explosion Pills 的建议,这就是我最终得到的结果:

This is what I eventually ended up with, based on Explosion Pills's advice:

function add_links( $string ) {
    function replace( $arr ) {
        if ( strncmp( "http", $arr[1], 4) == 0 ) {
            return "<a href=$arr[1]>$arr[1]</a>$arr[2]$arr[3]";
        } else {
            return "<a href=" . "http://" . $arr[1] . ">$arr[1]</a>$arr[2]$arr[3]";
        }
    }
return preg_replace_callback( '/\b((?:http|www).+?)((?!\/)[\p{P}]+)?(\s|$)/x', replace, $string );
}

我添加了一个回调,以便所有链接都以 http://开头,并对其处理标点符号的方式进行了一些调整.

I added a callback so that all of the links would start with http://, and did some fiddling with the way it handles punctuation.

这可能不是最好的做事方式,但它确实有效.在过去的一段时间里,我在这方面学到了很多东西,但还有更多东西要学!

It's probably not the Best way to do things, but it works. I've learned a lot about this in the last little while, but there is still more to learn!

推荐答案

preg_replace('/
    \b       # Initial word boundary
    (        # Start capture
    (?:      # Non-capture group
    http|www # http or www (alternation)
    )        # end group
    .+?      # reluctant match for at least one character until...
    )        # End capture
    (        # Start capture
    [,.]+    # ...one or more of either a comma or period.
             # add more punctuation as needed
    )?       # End optional capture
    (\s|$) # Followed by either a space character or end of string
    /x', '<a href="\1">\1</a>\2\3'

...可能是你想要的.我认为它仍然不完美,但至少应该可以满足您的需求.

...is probably what you are going for. I think it's still imperfect, but it should at least work for your needs.

旁白:我认为这是因为 \b 也匹配标点符号

Aside: I think this is because \b matches punctuation too

这篇关于正则表达式:如何匹配任何字符串直到空格,或者直到标点符号后跟空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆