正则表达式需要帮助 [英] Regex help required
问题描述
我试图替换< br />
中的两个或更多个(例如< br />< br / >< br />< / code>标记以及两个
< br />< br /> / p>
I am trying to replace two or more occurences of <br/>
(like <br/><br/><br/>
) tags together with two <br/><br/>
with the following pattern
Pattern brTagPattern = Pattern.compile("(<\\s*br\\s*/\\s*>\\s*){2,}",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
但有些情况下'< br /> < br />
'标签有一个空格,它们被替换为4 < br />
标签,这实际上应该是只需要2个标签即可。
But there are some cases where '<br/> <br/>
' tags come with a space and they get replaced with 4 <br/>
tags which was actually supposed to be replaced with just 2 tags.
我可以做些什么来忽略两个或三个(标签间)空格?
What can i do to ignore 2 or 3(few) spaces that come in between the tags ?
推荐答案
可能不是您想要听到的答案,但您不应该试图用正则表达式来解析XML / HTML。很多事情可能会出错 - 使用专门用于此类数据的解析库是一个更好的主意,这也将完全绕过您遇到的问题。
Probably not the answer you want to hear, but it is general wisdom that you should not attempt to parse XML/HTML with regular expressions. So many things can go wrong -- it's a much better idea to use a parsing library specifically meant for such data, which will also completely bypass the issue you're having.
如果您确定您的HTML已经过时,请查看 JAXB 。形成了XML,或者如果HTML可能是杂乱无章的(就像大多数真实世界的HTML),你应该尝试像 TagSoup 。
Take a look at JAXB if you are certain your HTML is well-formed XML, or if the HTML is likely to be messy and incompliant (like most real-world HTML) you should try something like TagSoup.
这篇关于正则表达式需要帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!