如何从 URL 中排除单词或字符串 - 正则表达式 [英] How to exclude a word or string from an URL - Regex

查看:45
本文介绍了如何从 URL 中排除单词或字符串 - 正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下正则表达式来匹配 PHP 中所有类型的 URL(效果很好):

 $reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

但现在,我想排除 Youtube、youtu.be 和 Vimeo 网址:

经过研究,我正在做类似的事情,但它不起作用:

$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

我想这样做,因为我有另一个匹配 Youtube url 的正则表达式,它返回一个 iframe,而这个正则表达式导致了两个正则表达式之间的混淆.

如有任何帮助,将不胜感激,谢谢.

解决方案

socodLib,要从字符串中排除某些内容,通过使用 ^ 锚定将自己置于字符串的开头(或使用另一个锚点)并使用否定前瞻来断言该字符串不包含单词,如下所示:

^(?!.*?(?:youtube|一些其他坏词|some\.string\.with\.dots))

在我们将正则表达式与您的正则表达式连接起来使其看起来过于复杂之前,让我们看看如果您想匹配某些单词字符 \w+ 而不是 youtube 或 google,我们会怎么做,您会写:

^(?!.*?(?:youtube|google))\w+

如您所见,在断言之后(我们说我们不想要的),我们使用 \w+ 说我们确实想要>

在您的情况下,让我们为您的初始正则表达式(我尚未调整)添加一个负面的前瞻:

$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

我冒昧地使用 (?i) 使正则表达式不区分大小写.您也可以在最后将 i 添加到 s 修饰符.youtu\.?be 表达式允许一个可选的点.

我确信你可以在未来将这个秘诀应用到你的表达式和其他正则表达式中.

参考

  1. 正则表达式查看
  2. StackOverflow 正则表达式常见问题解答

I'm using the following Regex to match all types of URL in PHP (It works very well):

 $reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

But now, I want to exclude Youtube, youtu.be and Vimeo URLs:

I'm doing something like this after researching, but it is not working:

$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

I want to do this, because I have another regex that match Youtube urls which returns an iframe and this regex is causing confusion between the two Regex.

Any help would be gratefully appreciated, thanks.

解决方案

socodLib, to exclude something from a string, place yourself at the beginning of the string by anchoring with a ^ (or use another anchor) and use a negative lookahead to assert that the string doesn't contain a word, like so:

^(?!.*?(?:youtube|some other bad word|some\.string\.with\.dots))

Before we make the regex look too complex by concatenating it with yours, let;s see what we would do if you wanted to match some word characters \w+ but not youtube or google, you would write:

^(?!.*?(?:youtube|google))\w+

As you can see, after the assertion (where we say what we don't want), we say what we do want by using the \w+

In your case, let's add a negative lookahead to your initial regex (which I have not tuned):

$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

I took the liberty of making the regex case insensitive with (?i). You could also have added i to your s modifier at the end. The youtu\.?be expression allows for an optional dot.

I am certain you can apply this recipe to your expression and other regexes in the future.

Reference

  1. Regex lookarounds
  2. StackOverflow regex FAQ

这篇关于如何从 URL 中排除单词或字符串 - 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆