如何让Jsoup获得孤儿文本? [英] How to get orphaned text with Jsoup?

查看:102
本文介绍了如何让Jsoup获得孤儿文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个html:

I have an html:

<span>This is the first text</span>
More text here 
Another line of text
<span>Text in the span</span>
<span>Another text in span</span>
This is another line

我想按顺序获取所有文本,像这样数组:

I want to get all the texts in order, something like this array:

[
"Span:This is the first text",
"More text here",
"Another line of text",
"Span:Text in the span",
"Span:Another text in span",
"This is another line",
]


推荐答案

我会用递归方法这需要您的开始标记并遍历其子节点。对于每个TextNode,打印内容。对于每个元素,请检查它是否有子节点。

I would go with a recursive method that takes your starting tag and iterates over its child nodes. For each TextNode, print the contents. For each Element, check it for child nodes.

public static void main(String[] args) throws ParseException, IOException
{
    //I put your HTML in the body tag in a local file
    Document doc = Jsoup.parse(new File("input/20160505.html"), "UTF-8");
    Elements elements = doc.getElementsByTag("body");
    Element rootTag = elements.get(0);
    printTextOfTag(rootTag);
}

public static void printTextOfTag(Element currentTag)
{
    List<Node> nodes = currentTag.childNodes();
    for(Node n : nodes)
    {
        if(n instanceof TextNode)
        {
            System.out.println(((TextNode)n).text());
        }
        else if(n instanceof Element)
        {
            printTextOfTag((Element)n);
        }
    }
}

输出

This is the first text

 More text here Another line of text 

Text in the span



Another text in span

 This is another line

这篇关于如何让Jsoup获得孤儿文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆