如何更改Java中的HTML标签内容? [英] How to change HTML tag content in Java?

查看:1053
本文介绍了如何更改Java中的HTML标签内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Java中更改标签的HTML内容?例如:

之前:

 < html> 
< head>
< / head>
< body>
< div>文字< div> **文字**< / div>文字< / div>
< / body>
< / html>

after:

 < HTML> 
< head>
< / head>
< body>
< div>文字< div> **新文字**< / div>文字< / div>
< / body>
< / html>

我尝试过JTidy,但它不支持 getTextContent 。有没有其他的解决方案?






谢谢,我想解析没有格式良好的HTML。我试过TagSoup,但是当我有这样的代码时:

 < body> 
sometext< div> text< / div>
< / body>

我想将sometext更改为someAnotherText,并且当我使用 {bodyNode} .getTextContent()它给了我:sometext text;当我使用 setTextContet(someAnotherText+ {bodyNode} .getTextContent()),并对这些结构进行序列化时,结果为< body> someAnotherText sometext text< / body> ,无< div> 标签。这对我来说是一个问题。

解决方案

除非你确定HTML是有效的,强烈建议使用HTML解析器,例如 TagSoup 杰里科 NekoHTML HTML Parser 等,两个第一个特别强大的解析任何类型的垃圾:)



例如,用 HTML分析器(因为实现非常简单),使用访客,提供您自己的 NodeVisitor

  public class MyNodeVisitor extends NodeVisitor {
public MyNodeVisitor(){
}

public void visitStringNode (文本字符串)
{
if(string.getText()。equals(** text **)){
string.setText(** new text **) ;
}
}
}

然后,创建一个 Parser ,解析HTML字符串并访问返回的节点列表:

 解析器解析器=新的解析器(htmlString); 
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());

这只是实现这一点的一种方式,非常简单。

How can I change HTML content of tag in Java? For example:

before:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**text**</div>text</div>
    </body>
</html>

after:

<html>
    <head>
    </head>
    <body>
        <div>text<div>**new text**</div>text</div>
    </body>
</html>

I tried JTidy, but it doesn't support getTextContent. Is there any other solution?


Thanks, I want parse no well-formed HTML. I tried TagSoup, but when I have this code:

<body>
sometext <div>text</div>
</body>

and I want change "sometext" to "someAnotherText," and when I use {bodyNode}.getTextContent() it gives me: "sometext text"; when I use setTextContet("someAnotherText"+{bodyNode}.getTextContent()), and serialize these structure, the result is <body>someAnotherText sometext text</body>, without <div> tags. This is a problem for me.

解决方案

Unless you are absolutely sure that the HTML will be valid and well formed, I'd strongly recommend to use an HTML parser, something like TagSoup, Jericho, NekoHTML, HTML Parser, etc, the two first being especially powerful to parse any kind of crap :)

For example, with HTML Parser (because the implementation is very easy), using a visitor, provide your own NodeVisitor:

public class MyNodeVisitor extends NodeVisitor {
    public MyNodeVisitor() {
    }

    public void visitStringNode (Text string)
    {
        if (string.getText().equals("**text**")) {
            string.setText("**new text**");
        }
    }
}

Then, create a Parser, parse the HTML string and visit the returned node list:

Parser parser = new Parser(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());

This is just one way to implement this, pretty straight forward.

这篇关于如何更改Java中的HTML标签内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆