帮助Java Swing HTML解析 [英] Help with Java Swing HTML parsing

查看：169 发布时间：2020/11/24 21:07:52 java swing html-parsing

本文介绍了帮助Java Swing HTML解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Java Swing HTML解析库来解析HTML文档的集合，并且试图隔离<title>标记之间的文本，以便可以使用它们来标识文档，但是我很难做到这一点因为handleStartTag方法无权访问标记内的文本

I am parsing a collection of HTML documents with the Java Swing HTML parsing libraries and I am trying to isolate the text between <title> tags so that I can use them to identify the documents but I am having a hard time accomplishing that since the handleStartTag method doesn't have access to the text inside of the tags

推荐答案

您可以使用XPath从HTML中提取数据:

You can use XPath to pull out data from HTML:

String html = //...

//read the HTML into a DOM
StreamSource source = new StreamSource(new StringReader(html));
DOMResult result = new DOMResult();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(source, result);
Node root = result.getNode();

//use XPath to get the title
XPath xpath = XPathFactory.newInstance().newXPath();
String title = xpath.evaluate("/html/title", root);

但是，HTML必须格式正确的XHTML才能起作用.例如，< br>"标记在HTML中有效，但在XHTML中无效，因为它没有关闭.它必须是< br/>"在XHTML中有效.

However, the HTML must be well formed XHTML for this to work. For example, the "<br>" tag is valid in HTML, but is invalid in XHTML because it is not closed. It must be "<br />" to be valid in XHTML.

这篇关于帮助Java Swing HTML解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

帮助Java Swing HTML解析 [英] Help with Java Swing HTML parsing

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

帮助Java Swing HTML解析 [英] Help with Java Swing HTML parsing

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭