有没有一种方法可以使用JSoup处理部分HTML页面 [英] Is there a way to manipulate partial HTML pages using JSoup

查看：89 发布时间：2021/2/14 18:45:43 java html jsoup

本文介绍了有没有一种方法可以使用JSoup处理部分HTML页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一些实用程序，该实用程序必须遍历一组HTML文件并对其进行操作.

I am developing some utility where, it would have to traverse through set of HTML files and manipulate them.

JSoup在解析和处理完整的文件(即它们具有<html> ... </html>标记)方面做得非常好.

JSoup does wonderful job in parsing and manipulating the files which are complete (i.e. they have <html> ... </html> tags).

但是对于部分页面(即，伤口包含诸如此类的标记的页面)

However for the partial pages i.e. the page which wound contain markup like,

<div id="leftnav">...</div>

它可以正确解析，但是在调用doc.toString()或doc.outerHtml()时，它将返回完整的HTML(将部分HTML内容包装在<html> <body> ... </body> </html>标记中)

it parses correctly but when doc.toString() or doc.outerHtml() is called, it returns full HTML (it wraps the partial HTML content in <html> <body> ... </body> </html> tags)

这对我来说是个问题，您能否让我知道JSoup中是否提供了这样的API，而不用这种方式来清理/清理HTML内容?

This is a problem for me, can you please let me know if such API is there in JSoup not to sanitize / clean the HTML content in such manner ?

谢谢.

推荐答案

您可以使用

创建一个新的XML解析器.该解析器假定不了解传入标签，而不将其视为HTML，而是创建一个简单的标签直接从输入中提取树.

Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.

换句话说:它不会创建典型的html结构(html，body，head等)，而是直接输入您的输入.

In other words: it doesn't create the typical html structure (html, body, head etc.) and takes your input as it is.

这里是使用方法:

// Using connect()
Document doc = Jsoup.connect("<url>").parser(Parser.xmlParser()).get();

// Using parse()
Document doc = Jsoup.parse("<html>", "<base url>", Parser.xmlParser());

这篇关于有没有一种方法可以使用JSoup处理部分HTML页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有没有一种方法可以使用JSoup处理部分HTML页面 [英] Is there a way to manipulate partial HTML pages using JSoup

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

有没有一种方法可以使用JSoup处理部分HTML页面 [英] Is there a way to manipulate partial HTML pages using JSoup

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭