使用JSoup去除HTML标签之间的文字 [英] Removing text enclosed between HTML tags using JSoup
问题描述
在某些HTML清理的情况下,我想保留标记之间的文本(这是Jsoup的默认行为),并且在某些情况下,我想删除文本以及HTML标记。有人可以抛出一些关于如何删除使用Jsoup的HTML标记之间的文本?
= http://jsoup.org/apidocs/org/jsoup/Jsoup.html#clean%28java.lang.String,%20java.lang.String,%20org.jsoup.safety.Whitelist%29>清洁将始终删除标签并保留文本。如果您需要删除元素(即标记和文本/嵌套元素),则可以预分析HTML,使用 remove()
或 empty()
,然后通过清理器运行结果。 例如:
字符串html =清洁< div>文字被删除< / div> ;
Document doc = Jsoup.parse(html);
doc.select(div)。remove();
//如果不删除,清洁工将删除< div>但留下内部文本
String clean = Jsoup.clean(doc.body()。html(),Whitelist.basic());
In some cases of HTML cleaning, I would like to retain the text enclosed between the tags(which is the default behaviour of Jsoup) and in some cases, I would like to remove the text as well as the HTML tags. Can someone please throw some light on how I can remove the text enclosed between the HTML tags using Jsoup?
The Cleaner will always drop tags and preserve text. If you need to drop elements (i.e. tags and text / nested elements), you can pre-parse the HTML, remove the elements using either remove()
or empty()
, then run the resulting through the cleaner.
For example:
String html = "Clean <div>Text dropped</div>";
Document doc = Jsoup.parse(html);
doc.select("div").remove();
// if not removed, the cleaner will drop the <div> but leave the inner text
String clean = Jsoup.clean(doc.body().html(), Whitelist.basic());
这篇关于使用JSoup去除HTML标签之间的文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!