如何从Java中的org.w3c.dom.Node获取HTML？ [英] How to get html from a org.w3c.dom.Node in java?

查看：339 发布时间：2020/10/25 19:37:13 java dom xpath saxon

本文介绍了如何从Java中的org.w3c.dom.Node获取HTML？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我建立了一种使用saxon-he的xpath组件从html文档中提取数据的方法。我正在为此使用w3c dom对象模型。

I've build a method which extracts data from an html document using the xpath components of saxon-he. I'm using w3c dom object model for this.

我已经创建了一个返回文本值的方法，类似于jsoup（jsoupElement.text （））：

I already created a method which returns the text-value, similar like the text value method from jsoup (jsoupElement.text()):

    protected String getNodeValue(Node node) {
    NodeList childNodes = node.getChildNodes();
    for (int x = 0; x < childNodes.getLength(); x++) {
        Node data = childNodes.item(x);
        if (data.getNodeType() == Node.TEXT_NODE)
            return data.getNodeValue();
    }
    return "";
 }

这很好，但是我现在需要选定节点的基础html（如果使用jsoup，它将为jsoupElement.html（））。使用w3c dom对象模型，我有org.w3c.dom.Node。如何从org.w3c.dom.Node中获取HTML作为字符串？我在文档中找不到关于此的任何内容。

This works fine but i now i need the underlying html of a selected node (with jsoup it would be jsoupElement.html()). Using the w3c dom object model i have org.w3c.dom.Node. How can i get the html from a org.w3c.dom.Node as String? I couldn't find anything regarding this in the documentation.

只是为了澄清：我需要内部html（带有或不带有node element / tag）作为String。类似 http://api.jquery.com/html/ 或 http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#html- -

Just for clarification: I need the inner html (with or without the node element/tag) as String. Similar like http://api.jquery.com/html/ or http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#html--

推荐答案

要序列化W3C DOM Node '的使用Saxon的HTML的子节点，可以使用默认的 Transformer ，将输出方法设置为 html ：

To serialize a W3C DOM Node's child nodes to HTML with Saxon you can use a default Transformer where you set the output method to html:

public static String getInnerHTML(Node node) throws TransformerConfigurationException, TransformerException
{
    StringWriter sw = new StringWriter();
    Result result = new StreamResult(sw);
    TransformerFactory factory = new net.sf.saxon.TransformerFactoryImpl();
    Transformer proc = factory.newTransformer();
    proc.setOutputProperty(OutputKeys.METHOD, "html");
    for (int i = 0; i < node.getChildNodes().getLength(); i++)
    {
        proc.transform(new DOMSource(node.getChildNodes().item(i)), result);
    }
    return sw.toString();
}

但是如上所述，这是树的序列化，原始XML或HTML没有存储在DOM树或Saxon的树模型中，无法访问它。

But as said, this is a serialization of the tree, the original XML or HTML is not stored in a DOM tree or Saxon's tree model, there is no way to access it.

这篇关于如何从Java中的org.w3c.dom.Node获取HTML？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从Java中的org.w3c.dom.Node获取HTML？ [英] How to get html from a org.w3c.dom.Node in java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何从Java中的org.w3c.dom.Node获取HTML？ [英] How to get html from a org.w3c.dom.Node in java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭