正则表达式忽略HTML标记中的URL [英] Regex ignore URL already in HTML tags

查看：127 发布时间：2018/6/13 16:35:48 php html regex preg-replace url-parsing

本文介绍了正则表达式忽略HTML标记中的URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的Regex存在一个小问题

我为自己的网站制作了一个自定义的BBcode，但我也希望解析URL。 / p>

我使用preg_replace，这是用来识别URLS的模式：

  /（[\w] +：\ / \ / [？\w  - &安培;;＃〜= \.\ / \ @] + [\w\ /]） / is

如果URL位于[img] [/ img]块内，，上面的模式也提取并产生这样的结果：

  // [img] http：// url。 com / toimg.jeg [/ img]会产生这样的结果：
< img src =< a href =http://url.com/toimg.jegtarget =_ blank> / > 
 //当它产生时：
< img src =http://url.com/toimg.jeg/>

我试过使用这个：

  /（[^ ] [\w] +：\？/ \ / [\w  - &安培;;＃〜= \.\ / \ @] + [\w\ /] [^] ）/是

没有

任何帮助将不胜感激。

编辑：
对于解决方案请参阅stema的回答的第二评论。
解决方案
试试这个

<？pre> （？！< HREF =）（\b [\w] +：\ / \ / [\w - &安培;;＃〜= \.\ / \ @] + [\w\ /]）
请参阅它在Regexr上

为了让它更通用一些可以简化你的后台，只检查=
（？<！=）（\ b [ \\ w] +：\ / \ / [\w - ？& ;;＃〜= \.\ / \ @] + [\ w \ /]）
请参阅Regexr上的

（？<！href =）是一个否定的后置断言，在你的模式之前没有href =。

\ b 是一个字边界t帽子将链接的开始锚定为从非单词改为单词字符。如果没有这个，lookbehind将是无用的，它会匹配ttp：// ...。

I'm having a little problem with my Regex

I've made a custom BBcode for my website, however I also want URLs to be parsed too.

I'm using preg_replace and this is the pattern used to identify URLS:
/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/is
Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:
//[img]http://url.com/toimg.jeg[/img] will produce this result: <img src="<a href="http://url.com/toimg.jeg" target="_blank">/> //When it should produce: <img src="http://url.com/toimg.jeg"/>
I tried using this:
/([^"][\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/][^"])/is
With no luck.

Any help will be appreciated.

Edit: For solution See the 2nd comment on stema's answer.
解决方案
Try this
(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])
See it here on Regexr

To make it more general you can simplify your lookbehind to check only for "=""
(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])
See it on Regexr

(?<!href=") is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.

\b is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.

这篇关于正则表达式忽略HTML标记中的URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式忽略HTML标记中的URL [英] Regex ignore URL already in HTML tags

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

正则表达式忽略HTML标记中的URL [英] Regex ignore URL already in HTML tags

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭