有没有一种方法可以使用JSoup处理部分HTML页面 [英] Is there a way to manipulate partial HTML pages using JSoup
问题描述
我正在开发一些实用程序,该实用程序必须遍历一组HTML文件并对其进行操作.
I am developing some utility where, it would have to traverse through set of HTML files and manipulate them.
JSoup在解析和处理完整的文件(即它们具有<html> ... </html>
标记)方面做得非常好.
JSoup does wonderful job in parsing and manipulating the files which are complete (i.e. they have <html> ... </html>
tags).
但是对于部分页面(即,伤口包含诸如此类的标记的页面)
However for the partial pages i.e. the page which wound contain markup like,
<div id="leftnav">...</div>
它可以正确解析,但是在调用doc.toString()
或doc.outerHtml()
时,它将返回完整的HTML(将部分HTML内容包装在<html> <body> ... </body> </html>
标记中)
it parses correctly but when doc.toString()
or doc.outerHtml()
is called, it returns full HTML (it wraps the partial HTML content in <html> <body> ... </body> </html>
tags)
这对我来说是个问题,您能否让我知道JSoup中是否提供了这样的API,而不用这种方式来清理/清理HTML内容?
This is a problem for me, can you please let me know if such API is there in JSoup not to sanitize / clean the HTML content in such manner ?
谢谢.