如何从一个XML JAVA中获取CDATA标记中包含的文本内容 [英] How to grab text content wrapped in CDATA tag from a piece of XML JAVA

查看:1301
本文介绍了如何从一个XML JAVA中获取CDATA标记中包含的文本内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下XML:

<?xml version="1.0"?>
<doOrchestration xmlns="http://comResponse.engine/response">
    <response uuid="86db9b58-312b-4cbb-8aa5-df3663884291">
        <headers>
            <header name="Content-Type">application/xml</header>
            <header name="Server">local-C++</header>
        </headers>
        <responseCode>200</responseCode>
        <content><![CDATA[<explanation></explanation>]]></content>
    </response>
</doOrchestration>

我想从内容节点解析以下文本,如下所示:

I'd like to parse out the following text from the content node as follows:

<![CDATA [< explanation>< / explanation>]]>

请注意,内容包含在CDATA标签中。如何使用任何方法在Java中完成此操作。

Notice here the content is wrapped in CDATA tags. How can I accomplish this in Java using any method.

这是我的代码:

@Test
public void testGetDoOrchResponse() throws IOException {
    String path = "/Users/haddad/Git/Tools/ContentUtils/src/test/resources/testdata/doOrch_testfiles/doOrch_response.xml";
    File f = new File(path);
    String response = FileUtils.readFileToString(f);

    String content = getDoOrchResponse(response, "content");
    System.out.println("Content: "+content);
}

//输出:内容:空白

// output: Content: blank

static String getDoOrchResponse(String xml, String tagFragment) throws FileNotFoundException { 

    String content = new String();
    try {
        Document doc = getDocumentXML(xml);
        NodeList nlNodeExplanationList = doc.getElementsByTagName("response"); 
        for(int i=0;i<nlNodeExplanationList.getLength();i++) {
            Node explanationNode = nlNodeExplanationList.item(i); 

            List<String> titleList = getTextValuesByTagName((Element)explanationNode, tagFragment);
            content = titleList.get(0);
        }
    } 
    catch (IOException e) {
        e.printStackTrace();
    }
    return content;
}



static List<String> getTextValuesByTagName(Element element, String tagName) {
    NodeList nodeList = element.getElementsByTagName(tagName);
    ArrayList<String> list = new ArrayList<String>();
    for (int i = 0; i < nodeList.getLength(); i++) {

        String textValue = getTextValue(nodeList.item(i));

        if(textValue.equalsIgnoreCase("") ) {
            textValue = "blank";
        }
        list.add(textValue);
    }
    return list;
}

static String getTextValue(Node node) {
    StringBuffer textValue = new StringBuffer();
    int length = node.getChildNodes().getLength();
    for (int i = 0; i < length; i ++) {
        Node c = node.getChildNodes().item(i);
        if (c.getNodeType() == Node.TEXT_NODE) {
            textValue.append(c.getNodeValue());
        }
    }
    return textValue.toString().trim();
}


static Document getDocumentXML(String xml) throws FileNotFoundException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db;
    Document doc = null;

    try {
        db = dbf.newDocumentBuilder();
        doc = db.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
        doc.getDocumentElement().normalize();
    } 
    catch (ParserConfigurationException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    }
    return doc;
}

我做错了什么?为什么我输出空白?我只是看不到它......

What am I doing wrong? Why do I get blank as output? I just don't see it...

推荐答案

如果要提取的内容元素节点然后使用 getTextContent()方法。如果您真的需要或想要CDATA部分标记,那么您需要使用 LSSerializer 或类似的序列化该节点:

If you want to extract the contents of an Element node then use the getTextContent() method. If you really need or want the CDATA section markup then you would need to serialize that node with LSSerializer or similar:

        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        docFactory.setNamespaceAware(true);
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();   

        Document doc = docBuilder.parse(new File("doc1.xml"));

        Element content = (Element)doc.getElementsByTagNameNS("http://comResponse.engine/response", "content").item(0);
        if (content != null)
        {
            System.out.println(content.getTextContent());
            LSSerializer ser = ((DOMImplementationLS)doc.getImplementation()).createLSSerializer();
            if (content.getFirstChild() != null)
            {
              System.out.println(ser.writeToString(content.getFirstChild()));
            }

        }

这是理论,对我来说Java JRE 1.8输出<![CDATA [< explanation>< / explanation> ,没有CDATA部分的结束标记,它看起来像 LSSerializer 与单个CDATA节节点无法正常工作。

That is the theory, for me Java JRE 1.8 outputs <![CDATA[<explanation></explanation> without the closing markup for the CDATA section, it looks like LSSerializer is not working correctly with a single CDATA section node.

这篇关于如何从一个XML JAVA中获取CDATA标记中包含的文本内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆