用Java中的XPath查询HTML页面 [英] Querying an HTML page with XPath in Java

查看：1629 发布时间：2018/6/26 12:06:13 java html jaxp xpath

本文介绍了用Java中的XPath查询HTML页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都可以给我一个Java库，允许我通过html页面执行XPath查询吗？我尝试使用JAXP，但它一直给我一个奇怪的错误，我似乎无法修复（线程主java.io.IOException：服务器返回的HTTP响应代码： 503 for URL： http://www.w3.org/TR/ xhtml1 / DTD / xhtml1-transitional.dtd ）。

非常感谢。

编辑

我发现这个：

  //创建一个新的SAX解析器工厂
 SAXParserFactory factory = SAXParserFactory.newInstance（）; 
 
 //打开验证
 factory.setValidating（true）; 
 
 //创建一个验证的SAX解析器实例
 SAXParser parser = factory.newSAXParser（）; 
 
 //创建一个新的DOM Document Builder工厂
 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance（）; 
 
 //打开验证
 factory.setValidating（true）; 
 
 //创建一个验证的DOM解析器
 DocumentBuilder builder = factory.newDocumentBuilder（）;

from http://www.ibm.com/developerworks/xml/library/x-jaxpval.html 但是，将argumrent变成false并没有改变任何东西。

解决方案

将解析器设置为非验证只会关闭验证;它确实不会禁止获取DTD。获取DTD不仅需要进行验证，还需要扩展实体......据我所知。

如果您想取消DTD的提取，您需要向 DocumentBuilderFactory 或 DocumentBuilder 注册适当的 EntityResolver 。实现 EntityResolver 的 resolveEntity 方法总是返回一个空字符串。

Can anyone advise me a library for Java that allows me to perform an XPath Query over an html page?

I tried using JAXP but it keeps giving me a strange error that I cannot seem to fix (thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd).

Thank you very much.

EDIT

I found this:
// Create a new SAX Parser factory SAXParserFactory factory = SAXParserFactory.newInstance(); // Turn on validation factory.setValidating(true); // Create a validating SAX parser instance SAXParser parser = factory.newSAXParser(); // Create a new DOM Document Builder factory DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); // Turn on validation factory.setValidating(true); // Create a validating DOM parser DocumentBuilder builder = factory.newDocumentBuilder();
from http://www.ibm.com/developerworks/xml/library/x-jaxpval.html But turning the argumrent to false did not change anything.
解决方案
Setting the parser to "non validating" just turns off validation; it does not inhibit fetching of DTD's. Fetching of DTD is needed not just for validation, but also for entity expansion... as far as I recall.

If you want to suppress fetching of DTD's, you need to register a proper EntityResolver to the DocumentBuilderFactory or DocumentBuilder. Implement the EntityResolver's resolveEntity method to always return an empty string.

这篇关于用Java中的XPath查询HTML页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Java中的XPath查询HTML页面 [英] Querying an HTML page with XPath in Java

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

用Java中的XPath查询HTML页面 [英] Querying an HTML page with XPath in Java

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭