如何使用jsoup编辑html标签中的所有文本值 [英] how to edit all text values in html tags using jsoup

查看:204
本文介绍了如何使用jsoup编辑html标签中的所有文本值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要的:我是 Jsoup 的新手。我想解析我的 html 字符串并搜索标签内出现的每个文本值(任何标签)。然后将该文本值更改为其他内容。

What I want: I am new to Jsoup. I want to parse my html string and search for each text value that appears inside tags (any tag). And then change that text value to something else.

我所做的:我可以更改单个标签的文本值。下面是代码:

What I have done: I am able to change the text value for single tag. Below is the code:

public static void main(String[] args) {
        String html = "<div><p>Test Data</p> <p>HELLO World</p></div>";
        Document doc1=Jsoup.parse(html);
        Elements ps = doc1.getElementsByTag("p");
        for (Element p : ps) {
          String pText = p.text();
          p.text(base64_Dummy(pText));
        }
        System.out.println("======================");
        String changedHTML=doc1.html();
        System.out.println(changedHTML);
    }

    public static String base64_Dummy(String abc){
        return "This is changed text";
    }

输出:

output:

======================
<html>
 <head></head>
 <body>
  <div>
   <p>This is changed text</p> 
   <p>This is changed text</p>
  </div>
 </body>
</html>

以上代码可以将 p 标签的价值。但是,在我的情况下, html 字符串可以包含任何标签;我想要搜索和更改其价值。
如何搜索html字符串中的所有标签并逐个更改其文本值。

Above code is able to change the p tag's value. But, in my case html string can contain any tag; whose value I want to search and change. How can I search all tags in html string and change their text value one by one.

推荐答案

您可以尝试类似于这段代码:

You can try with something similar to this code:

String html = "<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";

Document doc = Jsoup.parse(html);
List<Node> children = doc.childNodes();

// We will search nodes in a breadth-first way
Queue<Node> nodes = new ArrayDeque<>();

nodes.addAll(doc.childNodes());

while (!nodes.isEmpty()) {
    Node n = nodes.remove();

    if (n instanceof TextNode && ((TextNode) n).text().trim().length() > 0) {
        // Do whatever you want with n.
        // Here we just print its text...
        System.out.println(n.parent().nodeName()+" contains text: "+((TextNode) n).text().trim());
    } else {
        nodes.addAll(n.childNodes());
    }
}

你将得到以下输出: p>

And you'll get the following output:

body contains text: other text
p contains text: Test Data
p contains text: HELLO World

这篇关于如何使用jsoup编辑html标签中的所有文本值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆