使用jsoup解析html并删除标记块 [英] Parse html with jsoup and remove the tag block
本文介绍了使用jsoup解析html并删除标记块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想删除标记之间的所有内容。输入示例
I want to remove everything between a tag. An example input may be
输入:
<body>
start
<div>
delete from below
<div class="XYZ">
first div having this class
<div>
waste
</div>
<div class="XYZ">
second div having this class
</div>
waste
</div>
delete till above
</div>
<div>
this will also remain
</div>
end
</body>
输出将是:
<body>
start
<div>
delete from below
delete till above
</div>
<div>
this will also remain
</div>
end
</body>
基本上,我必须删除第一次出现<的整个块; div class =XYZ>
Basically, I have to remove the entire block for the first occurrence of <div class="XYZ">
谢谢,
推荐答案
您最好迭代找到的所有元素。所以你可以保证
You better iterate over all elements found. so you can be shure that
- a。)所有元素都被移除并且
- b。)如果没有元素就没有办法。
示例:
Document doc = ...
for( Element element : doc.select("div.XYZ") )
{
element.remove();
}
编辑:
(我的评论的补充)
Don'当一个简单的 null- /范围检查足够时,使用异常处理:
Don't use exception handling when a simple null- / range check is enough here:
doc.select("div.XYZ").first().remove();
而是:
Elements divs = doc.select("div.XYZ");
if( !divs.isEmpty() )
{
/*
* Here it's safe to call 'first()' since there at least one element.
*/
}
这篇关于使用jsoup解析html并删除标记块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文