正则表达式-完全匹配一个标签 [英] Regex - Matching exactly one single tag
问题描述
我有一个正则表达式可以从HTML字体标签中提取文本:
I have a regex to extract the text from an HTML font tag:
<FONT FACE=\"Excelsior LT Std Bold\"(.*)>(.*)</FONT>
在我有一些嵌套的字体标签之前,这种方法可以正常工作.而不是匹配
That's working fine until I have some nested font tags. Instead of matching
<FONT FACE="Excelsior LT Std Bold">Fett</FONT>
字符串的结果
<FONT FACE="Excelsior LT Std Bold">Fett</FONT> + <U>Unterstrichen</U> + <FONT FACE="Excelsior LT Std Italic">Kursiv</FONT> und Normal
是
<FONT FACE="Excelsior LT Std Bold">Fett</FONT> + <U>Unterstrichen</U> + <FONT FACE="Excelsior LT Std Italic"
我如何只获得第一个标签?
How do I get only the first tag?
推荐答案
您需要使用.*?
而不是.*
取消贪婪匹配.
You need to disabale greedy matching with .*?
instead of .*
.
<FONT FACE=\"Excelsior LT Std Bold\"([^>]*)>(.*?)</FONT>
请注意,如果在< FONT>的
FACE
属性后的某个地方有类似 BadAttribute =< FooBar>" >的属性,则此操作将失败.
标记.如果属性将包含</FONT>
,则这将混合两个匹配的组,并且可能完全混乱.因为正则表达式无法计算匹配的标签或引号,所以无法解决这一问题.因此,我绝对同意Tomalak-尽量避免使用正则表达式来处理XML,HTML和其他类似这样的标记语言.
Note that this will fail if there is a attribute like BadAttribute="<FooBar>"
somewhere after the FACE
attribute for the <FONT>
tag. This will mix both matching groups and it could get completly messed up if an attribute would contain </FONT>
. There is no way araound this because regular expressions cannot count matching tags or quotes. So I absolutly agree with Tomalak - try to avoid using regular expressions for processing XML, HTML, and other markup up languages like these.
这篇关于正则表达式-完全匹配一个标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!