使用JAXB将多个XML元素的内容提取为文本 [英] Using JAXB to extract content of several XML elements as text

查看:48
本文介绍了使用JAXB将多个XML元素的内容提取为文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下XML文件

<items>
   <title><a href="blabla">blabla</a></title>
   <text><a href="123">123</a></text>
</items>

我正在通过JAXB和XmlAnyElement注释(其中有两个实现DOMHandler的类)将XML解组到下一个Java对象.我想将元素"title"和"text"的内部XML提取为字符串.

I'm unmarshalling the XML to the next java object by JAXB and XmlAnyElement annotation with two classes implementing DOMHandler. I want to extract the inner XML of elements "title" and "text" as Strings.

public class Item implements Serializable {
    private String title;
    private String text;

    public String getTitle() {
        return title;
    }
    @XmlAnyElement(value = TitleHandler.class)
    public void setTitle(String title) {
        this.title = title;
    }
    public String getText() {
        return text;
    }
    @XmlAnyElement(value = TextHandler.class)
    public void setText(String text) {
        this.text = text;
    }
}

但是,当我在TitleHandler和TextHandler的方法"String getElement(StreamResult rt)"中放置一个断点时,这两个元素都使用TextHandler.class进行编组.元素标题"使用TextHandler而不是TitleHandler. 任何帮助将不胜枚举

But when i put a breakpoints in the method "String getElement(StreamResult rt)" of the TitleHandler and the TextHandler, both of elements use TextHandler.class for unmarshalling. Element "title" use TextHandler instead of TitleHandler. Any help will be greatly appriciated

更新 XmlAnyElement批注的限制使用约束: 在一个类及其超类中,只能有一个带有XmlAnyElement注释的JavaBean属性.

UPDATE Restriction usage constraints for XmlAnyElement annotation: There can be only one XmlAnyElement annotated JavaBean property in a class and its super classes.

推荐答案

@XmlAnyElement注释用作XML输入中未按名称映射到某些特定属性的元素的包罗万象.因此,每个类(包括继承的属性)只能有一个这样的注释.您想要的是什么

The @XmlAnyElement annotation is used as a catch-all for elements in the XML input that aren't mapped by name to some specific property. That's why there can be only one such annotation per class (including inherited properties). What you want is this:

public class Item implements Serializable {
    private String title;
    private String text;

    public String getTitle() {
        return title;
    }
    @XmlElement(name = "title")
    @XmlJavaTypeAdapter(value = TitleHandler.class)
    public void setTitle(String title) {
        this.title = title;
    }
    public String getText() {
        return text;
    }
    @XmlElement(name = "text")
    @XmlJavaTypeAdapter(value = TextHandler.class)
    public void setText(String text) {
        this.text = text;
    }
}

@XmlElement批注指示相应的属性已映射到具有该名称的元素.因此,Java text属性是从XML <text>元素派生的,而title属性是从<title>元素派生的.由于属性和元素的名称相同,因此这也是默认行为,没有@XmlElement批注,因此可以将它们省略.

The @XmlElement annotation indicates that the corresponding property is mapped to elements with that name. So the Java text property derives from the XML <text> element, and the title property from the <title> element. Since the names of the properties and the elements are the same, this is also the default behavior without the @XmlElement annotations, so you could leave them out.

为了处理从XML内容到String的转换,而不是实际结构(如Title类或Text类)的转换,您将需要一个适配器.这就是@XmlJavaTypeAdapter注释的用途.它指定必须如何处理该属性的编组/解组.

In order to handle the conversion from XML content to a String instead of an actual structure (like a Title class or Text class) you'll need an adapter. that's what the @XmlJavaTypeAdapter annotation is for. It specifies how marshalling/unmarshalling for that property must be handled.

查看此有用的答案: https://stackoverflow.com/a/18341694/630136

如何实现TitleHandler的示例.

import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.annotation.adapters.XmlAdapter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class TitleHandler extends XmlAdapter<Object, String> {

    /**
     * Factory for building DOM documents.
     */
    private final DocumentBuilderFactory docBuilderFactory;
    /**
     * Factory for building transformers.
     */
    private final TransformerFactory transformerFactory;

    public TitleHandler() {
        docBuilderFactory = DocumentBuilderFactory.newInstance();
        transformerFactory = TransformerFactory.newInstance();
    }

    @Override
    public String unmarshal(Object v) throws Exception {
        // The provided Object is a DOM Element
        Element titleElement = (Element) v;
        // Getting the "a" child elements
        NodeList anchorElements = titleElement.getElementsByTagName("a");
        // If there's none or multiple, return empty string
        if (anchorElements.getLength() != 1) {
            return "";
        }
        Element anchor = (Element) anchorElements.item(0);
        // Creating a DOMSource as input for the transformer
        DOMSource source = new DOMSource(anchor);
        // Default transformer: identity tranformer (doesn't alter input)
        Transformer transformer = transformerFactory.newTransformer();
        // This is necessary to avoid the <?xml ...?> prolog
        transformer.setOutputProperty("omit-xml-declaration", "yes");
        // Transform to a StringWriter
        StringWriter stringWriter = new StringWriter();
        StreamResult result = new StreamResult(stringWriter);
        transformer.transform(source, result);
        // Returning result as string
        return stringWriter.toString();
    }

    @Override
    public Object marshal(String v) throws Exception {
        // DOM document builder
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        // Creating a new empty document
        Document doc = docBuilder.newDocument();
        // Creating the <title> element
        Element titleElement = doc.createElement("title");
        // Setting as the document root
        doc.appendChild(titleElement);
        // Creating a DOMResult as output for the transformer
        DOMResult result = new DOMResult(titleElement);
        // Default transformer: identity tranformer (doesn't alter input)
        Transformer transformer = transformerFactory.newTransformer();
        // String reader from the input and source
        StringReader stringReader = new StringReader(v);
        StreamSource source = new StreamSource(stringReader);
        // Transforming input string to the DOM
        transformer.transform(source, result);
        // Return DOM root element (<title>) for JAXB marshalling to XML
        return doc.getDocumentElement();
    }

}

如果解组输入/编组输出的类型保留为Object,则JAXB将提供DOM节点.上面的代码使用XSLT转换(尽管没有实际的样式表,只是一个"identity"转换)将DOM输入转换为String,反之亦然.我已经在一个最小的输入文档上对其进行了测试,并且它既可以用于XML到Item对象,也可以用于其他方式.

If the type for unmarshalling input/marshalling output is left as Object, JAXB will provide DOM nodes. The above uses XSLT transformations (though without an actual stylesheet, just an "identity" transform) to turn the DOM input into a String and vice-versa. I've tested it on a minimal input document and it works for both XML to an Item object and the other way around.

以下版本将处理<title>中的任何XML内容,而不是期望单个<a>元素.您可能希望将其变成一个抽象类,然后使TitleHanderTextHandler对其进行扩展,以便实现中提供当前经过硬编码的<title>标签.

The following version will handle any XML content in <title> rather than expecting a single <a> element. You'll probably want to turn this into an abstract class and then have TitleHander and TextHandler extend it, so that the currently hardcoded <title> tags are provided by the implementation.

import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.annotation.adapters.XmlAdapter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class TitleHandler extends XmlAdapter<Object, String> {

    /**
     * Factory for building DOM documents.
     */
    private final DocumentBuilderFactory docBuilderFactory;
    /**
     * Factory for building transformers.
     */
    private final TransformerFactory transformerFactory;

    /**
     * XSLT that will strip the root element. Used to only take the content of an element given
     */
    private final static String UNMARSHAL_XSLT = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<xsl:transform xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">\n" +
"\n" +
"    <xsl:output method=\"xml\" omit-xml-declaration=\"yes\" />\n" +
"\n" +
"    <xsl:template match=\"/*\">\n" +
"      <xsl:apply-templates select=\"@*|node()\"/>\n" +
"    </xsl:template>\n" +
"\n" +
"    <xsl:template match=\"@*|node()\">\n" +
"        <xsl:copy>\n" +
"            <xsl:apply-templates select=\"@*|node()\"/>\n" +
"        </xsl:copy>\n" +
"    </xsl:template>\n" +
"    \n" +
"</xsl:transform>";

    public TitleHandler() {
        docBuilderFactory = DocumentBuilderFactory.newInstance();
        transformerFactory = TransformerFactory.newInstance();
    }

    @Override
    public String unmarshal(Object v) throws Exception {
        // The provided Object is a DOM Element
        Element rootElement = (Element) v;
        // Creating a DOMSource as input for the transformer
        DOMSource source = new DOMSource(rootElement);
        // Creating a transformer that will strip away the root element
        StreamSource xsltSource = new StreamSource(new StringReader(UNMARSHAL_XSLT));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);
        // Transform to a StringWriter
        StringWriter stringWriter = new StringWriter();
        StreamResult result = new StreamResult(stringWriter);
        transformer.transform(source, result);
        // Returning result as string
        return stringWriter.toString();
    }

    @Override
    public Object marshal(String v) throws Exception {
        // DOM document builder
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        // Creating a new empty document
        Document doc = docBuilder.newDocument();
        // Creating a DOMResult as output for the transformer
        DOMResult result = new DOMResult(doc);
        // Default transformer: identity tranformer (doesn't alter input)
        Transformer transformer = transformerFactory.newTransformer();
        // String reader from the input and source
        StringReader stringReader = new StringReader("<title>" + v + "</title>");
        StreamSource source = new StreamSource(stringReader);
        // Transforming input string to the DOM
        transformer.transform(source, result);
        // Return DOM root element for JAXB marshalling to XML
        return doc.getDocumentElement();
    }

}

这篇关于使用JAXB将多个XML元素的内容提取为文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆