如何更改Java中的HTML标签内容? [英] How to change HTML tag content in Java?
问题描述
如何在Java中更改标签的HTML内容?例如:
之前:
< html>
< head>
< / head>
< body>
< div>文字< div> **文字**< / div>文字< / div>
< / body>
< / html>
after:
< HTML>
< head>
< / head>
< body>
< div>文字< div> **新文字**< / div>文字< / div>
< / body>
< / html>
我尝试过JTidy,但它不支持 getTextContent
。有没有其他的解决方案?
谢谢,我想解析没有格式良好的HTML。我试过TagSoup,但是当我有这样的代码时:
< body>
sometext< div> text< / div>
< / body>
我想将sometext更改为someAnotherText,并且当我使用 {bodyNode} .getTextContent()
它给了我:sometext text;当我使用 setTextContet(someAnotherText+ {bodyNode} .getTextContent())
,并对这些结构进行序列化时,结果为< body> someAnotherText sometext text< / body>
,无< div>
标签。这对我来说是一个问题。
除非你确定HTML是有效的,强烈建议使用HTML解析器,例如 TagSoup ,杰里科, NekoHTML , HTML Parser 等,两个第一个特别强大的解析任何类型的垃圾:)
例如,用 HTML分析器(因为实现非常简单),使用访客,提供您自己的 NodeVisitor
:
public class MyNodeVisitor extends NodeVisitor {
public MyNodeVisitor(){
}
public void visitStringNode (文本字符串)
{
if(string.getText()。equals(** text **)){
string.setText(** new text **) ;
}
}
}
然后,创建一个 Parser
,解析HTML字符串并访问返回的节点列表:
解析器解析器=新的解析器(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());
这只是实现这一点的一种方式,非常简单。
How can I change HTML content of tag in Java? For example:
before:
<html>
<head>
</head>
<body>
<div>text<div>**text**</div>text</div>
</body>
</html>
after:
<html>
<head>
</head>
<body>
<div>text<div>**new text**</div>text</div>
</body>
</html>
I tried JTidy, but it doesn't support getTextContent
. Is there any other solution?
Thanks, I want parse no well-formed HTML. I tried TagSoup, but when I have this code:
<body>
sometext <div>text</div>
</body>
and I want change "sometext" to "someAnotherText," and when I use {bodyNode}.getTextContent()
it gives me: "sometext text"; when I use setTextContet("someAnotherText"+{bodyNode}.getTextContent())
, and serialize these structure, the result is <body>someAnotherText sometext text</body>
, without <div>
tags. This is a problem for me.
Unless you are absolutely sure that the HTML will be valid and well formed, I'd strongly recommend to use an HTML parser, something like TagSoup, Jericho, NekoHTML, HTML Parser, etc, the two first being especially powerful to parse any kind of crap :)
For example, with HTML Parser (because the implementation is very easy), using a visitor, provide your own NodeVisitor
:
public class MyNodeVisitor extends NodeVisitor {
public MyNodeVisitor() {
}
public void visitStringNode (Text string)
{
if (string.getText().equals("**text**")) {
string.setText("**new text**");
}
}
}
Then, create a Parser
, parse the HTML string and visit the returned node list:
Parser parser = new Parser(htmlString);
NodeList nl = parser.parse(null);
nl.visitAllNodesWith(new MyNodeVisitor());
System.out.println(nl.toHtml());
This is just one way to implement this, pretty straight forward.
这篇关于如何更改Java中的HTML标签内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!