Jsoup检查字符串是否是有效的HTML [英] Jsoup check if string is valid HTML

查看:288
本文介绍了Jsoup检查字符串是否是有效的HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了Jsoup解析器的困难。如何判断给定字符串是否为有效的HTML代码?

I am having difficulties with Jsoup parser. How can I tell if given string is a valid HTML code?

String input = "Your vote was successfully added."
boolean isValid = Jsoup.isValid(input);
// isValid = true

isValid flag为true,因为Jsoup首先使用HtmlTreeBuilder:if如果缺少html,head或body标签,它会自行添加它们。然后它使用Cleaner类并根据给定的Whitelist进行检查。

isValid flag is true, because Jsoup first uses HtmlTreeBuilder: if ony of html, head or body tag is missing, it adds them by itself. Then it uses Cleaner class and checks it against given Whitelist.

有没有简单的方法可以检查字符串是否是有效的HTML而没有Jsoup尝试将其设为HTML?

Is there any simple way to check if string is a valid HTML without Jsoup attempts to make it HTML?

我的例子是AJAX响应,它是text / html内容类型。然后它转到解析器,Jsoup添加了这个标签,结果,响应没有正确显示。

My example is AJAX response, which comes as "text/html" content type. Then it goes to parser, Jsoup adds this tags and as a result, response is not displayed properly.

感谢您的帮助。

推荐答案

首先,Reuben提出的解决方案没有按预期工作。必须使用Pattern.DOTALL标志编译模式。输入HTML可能有(也可能会)新的线路标志等。

First of all, solution proposed by Reuben is not working as expected. Pattern has to be compiled with Pattern.DOTALL flag. Input HTML may have (and probably will) new line signs etc.

所以它应该是这样的:

Pattern htmlPattern = Pattern.compile(".*\\<[^>]+>.*", Pattern.DOTALL);
boolean isHTML = htmlPattern.matcher(input).matches();

我也认为这种模式不仅应该找到HTML标签。下一个:不是唯一有效的选项。也可能有属性,即。这也必须处理。

I also think that this pattern should find HTML tag not only . Next: is not the only valid option. There may also be attribute i.e . This also has to be handled.

我选择修改Jsoup源代码。如果HTMLTreeBuilder(实际上是 BeforeHtml )尝试添加 < html> 元素,我会抛出ParseException然后我肯定该输入文件不是有效的HTML文件。

I chose to modify Jsoup source. If HTMLTreeBuilder (actually state BeforeHtml) tries to add <html> element I throw ParseException and then I am sure that input file was not a valid HTML file.

这篇关于Jsoup检查字符串是否是有效的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆