正则表达式 - 不匹配标签 [英] regex - match not in tag

查看:149
本文介绍了正则表达式 - 不匹配标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这应该很容易,但不知何故我无法弄清楚:
我有这样的HTML代码片段:
< p style =padding:0 10 20 30; margin:1 2 3 4>这是201个20个文本1< b> 30< / b>一些数字为30 20< / p> ...



我需要匹配数字1,20,30(仅限那些),并用链接替换它们。
显然我不想替换标记内的数字

输出应该是:
< p style = padding:0 10 20 30; margin:1 2 3 4>这是201一些< a href =#20> 20< / a> text< a href =#1> 1< / a> < b>< a href =#30> 30< / a>< / b>带有一些数字< a href =#30> 30< / a> < a href =#20> 20< / a>< / p> ...



这是我的:

  $ text ='< p style =padding:0 10 20 30; margin:1 2 3 4>这是201个20个文本1< b> 30< / b>一些数字为30 20< / p> ......; 

$ pat [] ='/(?< = \>)([^ < - *]([^ 0-9 \:])(1 | 20 | 30 )([^ 0-9])/ S';
$ repl [] ='$ 1 $ 2< a href =#$ 3class =p2> $ 3< / a> $ 4';
echo preg_replace($ pat,$ repl,$ text);

它可以工作,但一次只能匹配一个数字,我不想运行它

任何想法?



-



我看到了使用HTML解析器的意义,但是它似乎可以用regexp来完成。特别是当没有用PHP解析HTML的标准库时,我不确定是否要为此任务导入第三方HTML解析器。
任何尝试修复我的正则表达式?


-
我设法编写了适合我的情况的regexp。如果有人感兴趣:

$ b $ pat {] ='/>((([^ <] *)(( [^ 0-9\:]))|())(1 | 20 | 30)((=并[d] +)(小于\ / A>)|(([^????! 0-9\<])([^<] *)≤(?!\ / A>)))/ SU';
$ repl [] ='> $ 1< a href =#$ 6class =p22> $ 6< / a> $ 7';



我非常清楚,使用HTML解析器可以轻松完成,但我不想包含第三方解析器在我的软件中。



问候,
Philia

解决方案

简单:只用HTML解析器提取文本,然后使用正则表达式。


this should be easy but somehow I can't figure it out: I have HTML snippet like this one: <p style="padding:0 10 20 30; margin: 1 2 3 4 ">This is 201 some 20 text 1 <b>30</b> with some numbers 30 20</p> ...

I need to match numbers 1, 20, 30 (only those) and replace them with links. Obviously I do not want to replace numbers inside tag

The output should be: <p style="padding:0 10 20 30; margin: 1 2 3 4 ">This is 201 some <a href="#20">20</a> text <a href="#1">1</a> <b><a href="#30">30</a></b> with some numbers <a href="#30">30</a> <a href="#20">20</a></p> ...

This is what I have:

$text = '<p style="padding:0 10 20 30; margin: 1 2 3 4 ">This is 201 some 20 text 1 <b>30</b> with some numbers 30 20</p> ...';

$pat[]  = '/(?<=\>)([^<]*)([^0-9\:])(1|20|30)([^0-9])/s';
$repl[] = '$1$2<a href="#$3" class="p2">$3</a>$4';
echo preg_replace($pat, $repl, $text);

It works but it matches only one number at a time, and I do not want to run it in loop.

Any ideas?

--

I see the point of using HTML parser, however it seems like something that can be done with regexp. Especially when there is no standard library for parsing HTML in PHP, and I'm not sure if I want to import third party HTML parser just for this task. Any attempt to fix my regex?

-- I managed to write regexp that works in my case. If anyone is interested:

$pat[] = '/>(([^<]*)(([^0-9\:]))|())(1|20|30)(?(?=[<]+?)(?!<\/a>)|(([^0-9\<])([^<]*)<(?!\/a>)))/sU'; $repl[] = '>$1<a href="#$6" class="p22">$6</a>$7';

I know very well that it can be easily accomplished with HTML parser, but I do not want to include third party parsers in my software.

Regards, Philia

解决方案

It is really simple: extract only the text with an HTML parser, then use regular expressions on that.

这篇关于正则表达式 - 不匹配标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆