如何使用JSoup将标签之间的标签和文本提取到列表中 [英] How to extract tags and text between tags to a list with JSoup

查看:122
本文介绍了如何使用JSoup将标签之间的标签和文本提取到列表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下html:

<div class="CustomClass">
    Hi!<br/>
    <br/>
    Bla Bla bla<br/>
    <br/>
    <a href...></a>
    bla bla bla
    <iframe...></iframe>
    Thank you!
</div>

我需要一个包含div子级的列表,如下所示:

I need a list with the children of the div, something like the following:

0->Hi!
2-><br/>
3->Bla Bla bla
4-><br/>
5-><a href...></a>
6->bla bla bla
7-><iframe...></iframe>
8->Thank you!

我尝试过获取div元素的子代,然后迭代这些子代并将其转换为html,但这仅返回标记元素,而忽略这些元素之间的文本.在理想情况下,文本将被p标签包围,但事实并非如此:S

I tried by getting the children of the div element, and then iterating the children and converting them to html, but this returns only the tag elements and ignores the text between the elements. In ideal circumstances, the text would be surrounded by p tags, but this is not the case :S

如果我在div元素上使用element.ownText函数,那么我得到的文本不带标签,并且我需要这两种东西,并且顺序正确:/

If I use the element.ownText function on the div element, then I get the text without the tags, and I need both things, and in the right order :/

有没有办法做到这一点?

Is there a way to achieve that?

谢谢!

推荐答案

您可以使用childNodes()获取Node的列表,这正是您所需要的:

You can use childNodes() to obtain a list of Node and it will be exactly what you need:

Document doc = Jsoup.parse("<div class=\"CustomClass\">Hi!<br/><br/>Bla Bla bla<br/><br/><a href...></a>bla bla bla<iframe></iframe>Thank you!</div>");
Element div = doc.selectFirst(".CustomClass");
List<Node> childNodes = div.childNodes();
for (int i = 0; i < childNodes.size(); i++) {
    Node node = div.childNodes().get(i);
    System.out.println(i + " -> " + node);
}

输出:

0 -> 
Hi!
1 -> <br>
2 -> <br>
3 -> Bla Bla bla
4 -> <br>
5 -> <br>
6 -> <a href...></a>
7 -> bla bla bla
8 -> <iframe></iframe>
9 -> Thank you!

这篇关于如何使用JSoup将标签之间的标签和文本提取到列表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆