我如何解析CDATA部分中的HTML标签的XML文件? [英] How can i parse an XML file with HTML tags inside CDATA section?

查看:99
本文介绍了我如何解析CDATA部分中的HTML标签的XML文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 <?xml version =1.0encoding =utf-8standalone =yes?> 
< extendedinfo type =html>
<![CDATA [< table class =ResultTablecellpadding = 2 cellspacing = 1 border = 0>< tr class =TableHeadingLine>< th bgcolor =#b3b3b3align =测试用例:初始测试报告< / B>< font face =arial,verdana,trebuchet,officina,sans-serifsize =+ 2>< B>< < th class =TableHeadingLine>< th class =TableHeadingCellwidth =120px>< / th>< th class =TableHeadingCell < th class =345px>< th class =345px>< th class =345px>< th class =345px>< / th& < th class =TableHeadingCellwidth =345px>< / th>< th class =TableHeadingCellwidth =70px>< / th>< / tr>]>> ;
< / extendedinfo>
< extendedinfo type =html>
<![CDATA [< tr>< td class =DefineCell> 58.675124< / td>< td class =DefaultCellcolspan =5>< i>< ; font_color =#008000> Set_Temperature设置为23< / font><>< br> Set_Temperature = 23< / td>< / tr>]]>
< / extendedinfo>

我有一个由上述格式的工具生成的.XML文件,在CDATA部分。哪个解析器或者以什么方式可以从javax.xml文件中使用java检索html数据?解析方案

只需将CDATA作为文本内容进行访问

变体1(DOM):



  import java.io.的BufferedInputStream; 
import java.io.FileInputStream;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public void getCDATAFromHardcodedPathWithDom(){
String yourSampleFile =/path/toYour/sample/file.xml;
String cdataNode =extendedinfo;
try(InputStream in =
new BufferedInputStream(new FileInputStream(yourSampleFile))){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(in);
NodeList元素= doc.getElementsByTagName(cdataNode);
for(int i = 0; i< elements.getLength(); i ++){
Node e = elements.item(i);
System.out.println(e.getTextContent());
}
} catch(Exception e){
throw new RuntimeException(e);




$ h $变种2(stax):$ /

  import java.io.BufferedInputStream; 
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void getCDATAFromHardcodedPathWithStax(){
String yourSampleFile =/path/toYour/sample/file.xml;
String cdataNode =extendedinfo;
XMLStreamReader r = null;
try(InputStream in =
new BufferedInputStream(new FileInputStream(yourSampleFile));){
XMLInputFactory factory = XMLInputFactory.newInstance();
r = factory.createXMLStreamReader(in);
while(r.hasNext()){
switch(r.getEventType()){
case XMLStreamConstants.START_ELEMENT:
if(cdataNode.equals(r.getName() .getLocalPart())){
System.out.println(r.getElementText());
}
break;
默认值:
break;
}
r.next();
}
} catch(Exception e){
throw new RuntimeException(e);
} finally {
if(r!= null){
try {
r.close();
} catch(Exception e){
throw new RuntimeException(e);




$ / code>

/path/toYour/sample/file.xml

 <?xml version =1.0encoding =utf-8 standalone =yes?> 
< root>
< extendedinfo type =html>
<![CDATA [< table class =ResultTablecellpadding = 2 cellspacing = 1 border = 0>< tr class =TableHeadingLine>< th bgcolor =#b3b3b3align =测试用例:初始测试报告< / B>< font face =arial,verdana,trebuchet,officina,sans-serifsize =+ 2>< B>< < th class =TableHeadingLine>< th class =TableHeadingCellwidth =120px>< / th>< th class =TableHeadingCell < th class =345px>< th class =345px>< th class =345px>< th class =345px>< / th& < th class =TableHeadingCellwidth =345px>< / th>< th class =TableHeadingCellwidth =70px>< / th>< / tr>]>> ;
< / extendedinfo>
< extendedinfo type =html>
<![CDATA [< tr>< td class =DefineCell> 58.675124< / td>< td class =DefaultCellcolspan =5>< i>< ; font_color =#008000> Set_Temperature设置为23< / font><>< br> Set_Temperature = 23< / td>< / tr>]]>
< / extendedinfo>
< / root>

它会给你

pre > < table class =ResultTablecellpadding = 2 cellspacing = 1 border = 0>< tr class =TableHeadingLine>< th bgcolor =#b3b3b3align =leftcolspan =6>< font face =arial,verdana,trebuchet,officina,sans-serifsize =+ 2>< B> Testcase:Init Testreport< / B>< / font>< < th class>< / tr>< tr class =TableHeadingLine>< th class =TableHeadingCellwidth =120px>< >< th class =TableHeadingCellwidth =80px>< / th>< th class =TableHeadingCellwidth =345px>< / th><< ; th class =TableHeadingCellwidth =345px>< / th>< th class =TableHeadingCellwidth =70px>< / th>< / tr>


< tr>< td class =DefineCell> 58.675124< / td>< td class =DefaultCellcolspan =5>< i> ;< font color =#008000> Set_Temperature设置为23< / font>< / i>< br> Set_Temperature = 23< / td>< / tr>

使用JAXB的一个有趣的选择在这里给出:



从CDATA检索价值



如何提取所有CDATA的例子如下: -check-cdata-in-xml-using-xmleventreader-in-stax>无法使用Stax中的XMLEventReader检查XML中的CDATA


<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>

I have a .XML file generated by a tool in the above format, with html data within CDATA sections. Which parser or in what way can I retrieve the html data from the XMLfile using java?

解决方案

Just access the CDATA as text content

Variant 1 (DOM):

    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;

public void getCDATAFromHardcodedPathWithDom() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    String cdataNode = "extendedinfo";
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile))) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(in);
        NodeList elements = doc.getElementsByTagName(cdataNode);
        for (int i = 0; i < elements.getLength(); i++) {
            Node e = elements.item(i);
            System.out.println(e.getTextContent());
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Variant 2 (stax):

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void getCDATAFromHardcodedPathWithStax() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    String cdataNode = "extendedinfo";
    XMLStreamReader r = null;
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile));)        {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        r = factory.createXMLStreamReader(in);
        while (r.hasNext()) {
            switch (r.getEventType()) {
            case XMLStreamConstants.START_ELEMENT:
                if (cdataNode.equals(r.getName().getLocalPart())) {
                    System.out.println(r.getElementText());
                }
                break;
            default:
                break;
            }
            r.next();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    } finally {
        if (r != null) {
            try {
                r.close();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

With /path/toYour/sample/file.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<root>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>
</root>

It will give you

<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>


<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>

An interesting alternative using JAXB is given here:

Retrieve value from CDATA

An example on how to extract just all CDATA is given here:

Unable to check CDATA in XML using XMLEventReader in Stax

这篇关于我如何解析CDATA部分中的HTML标签的XML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆