正则表达式需要帮助 [英] Regex help required

查看:149
本文介绍了正则表达式需要帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图替换< br /> 中的两个或更多个(例如< br />< br / >< br />< / code>标记以及两个< br />< br /> / p>

I am trying to replace two or more occurences of <br/> (like <br/><br/><br/>) tags together with two <br/><br/> with the following pattern

Pattern brTagPattern = Pattern.compile("(<\\s*br\\s*/\\s*>\\s*){2,}", 
     Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

但有些情况下'< br /> < br /> '标签有一个空格,它们被替换为4 < br /> 标签,这实际上应该是只需要2个标签即可。

But there are some cases where '<br/> <br/>' tags come with a space and they get replaced with 4 <br/> tags which was actually supposed to be replaced with just 2 tags.

我可以做些什么来忽略两个或三个(标签间)空格?

What can i do to ignore 2 or 3(few) spaces that come in between the tags ?

推荐答案

可能不是您想要听到的答案,但您不应该试图用正则表达式来解析XML / HTML。很多事情可能会出错 - 使用专门用于此类数据的解析库是一个更好的主意,这也将完全绕过您遇到的问题。

Probably not the answer you want to hear, but it is general wisdom that you should not attempt to parse XML/HTML with regular expressions. So many things can go wrong -- it's a much better idea to use a parsing library specifically meant for such data, which will also completely bypass the issue you're having.

如果您确定您的HTML已经过时,请查看 JAXB 。形成了XML,或者如果HTML可能是杂乱无章的(就像大多数真实世界的HTML),你应该尝试像 TagSoup

Take a look at JAXB if you are certain your HTML is well-formed XML, or if the HTML is likely to be messy and incompliant (like most real-world HTML) you should try something like TagSoup.

这篇关于正则表达式需要帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆