正则表达式忽略HTML标记中的URL [英] Regex ignore URL already in HTML tags
问题描述
我的Regex存在一个小问题
我为自己的网站制作了一个自定义的BBcode,但我也希望解析URL。 / p>
我使用preg_replace,这是用来识别URLS的模式:
/([\w] +:\ / \ / [?\w - &安培;;#〜= \.\ / \ @] + [\w\ /]) / is
如果URL位于[img] [/ img]块内, ,上面的模式也提取并产生这样的结果:
// [img] http:// url。 com / toimg.jeg [/ img]会产生这样的结果:
< img src =< a href =http://url.com/toimg.jegtarget =_ blank> / >
//当它产生时:
< img src =http://url.com/toimg.jeg/>
我试过使用这个:
/([^ ] [\w] +:\?/ \ / [\w - &安培;;#〜= \.\ / \ @] + [\w\ /] [^] )/是
没有
任何帮助将不胜感激。
编辑:
对于解决方案请参阅stema的回答的第二评论。
试试这个
<?pre>
(?!< HREF =)(\b [\w] +:\ / \ / [\w - &安培;;#〜= \.\ / \ @] + [\w\ /])
请参阅它在Regexr上
为了让它更通用一些可以简化你的后台,只检查=
(?<!=)(\ b [ \\ w] +:\ / \ / [\w - ?& ;;#〜= \.\ / \ @] + [\ w \ /])
(?<!href =)
是一个否定的后置断言,在你的模式之前没有href =。
\ b 是一个字边界t帽子将链接的开始锚定为从非单词改为单词字符。如果没有这个,lookbehind将是无用的,它会匹配ttp:// ...。
I'm having a little problem with my Regex
I've made a custom BBcode for my website, however I also want URLs to be parsed too.
I'm using preg_replace and this is the pattern used to identify URLS:
/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/is
Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:
//[img]http://url.com/toimg.jeg[/img] will produce this result:
<img src="<a href="http://url.com/toimg.jeg" target="_blank">/>
//When it should produce:
<img src="http://url.com/toimg.jeg"/>
I tried using this:
/([^"][\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/][^"])/is
With no luck.
Any help will be appreciated.
Edit: For solution See the 2nd comment on stema's answer.
Try this
(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])
See it here on Regexr
To make it more general you can simplify your lookbehind to check only for "=""
(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])
See it on Regexr
(?<!href=")
is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.
\b
is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.
这篇关于正则表达式忽略HTML标记中的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!