当节点内部文本为html时,Java解析xml文件 [英] Java parse xml file when node inner text is html

查看:61
本文介绍了当节点内部文本为html时,Java解析xml文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我使用SAXParser和自己的处理程序,它可以解析除了type =html之外的所有节点值。

我的字符函数是这样的:

  public void characters(char ch [],int start,int length)throws SAXException {
if(content){
String tmp = new String(ch,start,length);
System.out.println(Content:+ tmp);
content = false;
}

并且该特定节点具有以下格式,我的输出总是给我

 < content type =html> 

& lt; img alt =src =http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png/& gt;


& lt; p& gt;坏机器人,由极客文化制造商J.J.创立的制作公司。 Abrams(< i& gt; Lost& lt; / i& gt ;,< i& gt; Fringe& lt;& gt ;,< i& gt; Star Trek:Into黑暗< / i& gt ;,< i& gt; Alias& lt; /& gt;等等)已经发布了一个< / i& gt;< a href =http://youtu.be/FWaAZCaQXdotarget =_ blank& gt;神秘的新预告片< / a& gt;标题为陌生人。今天下午官方的坏机器人Twitter帐户发布的令人毛骨悚然和不可思议的视频点呈现星空;一个长发,绳索的男人沿着荒凉的单色岸线漫步;和你的花园品种,可怕的缝合口的人开始关注。 男人被抹去并重生,一个听起来有点像Leonard Nimoy的叙述者。& lt; / p& gt;
& lt; p& gt;& lt; / p& gt;



< / content>


解决方案

您可能错误地认为<$ c $在 startElement endElement 回调之间只会出现一次c>字符它实际上被称为多次。

由于您使用内容布尔成员来确定是否打印东西,并且将同一个成员设置为 false 在字符回调中,您的条件只会被执行一次,直到您重置内容(你不知道你在哪里做的)。



下面是一个可以用XML处理的例子(假设非混合内容和Java编程语言):

  import java.io.IOException; 
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSaxParser {

public static void main(String [] args)throws ParserConfigurationException,SAXException,IOException {
String xml =
<< ; content type = \html\> \\\
+
\\\
+
& lt; img alt = \\src = \ http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png\/& gt; \\\
+
\\\
+
\ n+
& lt; p& gt;坏机器人,由极客文化肇事者JJ Abrams(< i& gt; Lost& lt; / i& gtg ;& lt; i& gt; Fringe& lt;& gt ;,< i& gt;星际旅行:进入黑暗< / i& gt;& lt; i& gt; Alias< / i& gt;,& amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; target = \_blank\& gt;神秘的新预告片& lt; / a& gt;标题为\陌生人。今天下午官方的坏机器人Twitter帐户发布的令人毛骨悚然和不可思议的视频点以星空为特色;一个长发,绳索的男人沿着荒凉的单色岸线漫步;和你的花园品种,可怕的缝合口的人开始关注。 \\ \\男人被抹去和重生,一个听起来有点像伦纳德·尼莫伊的叙述者说道。& lt; / p& gt; \\\
+
& lt; p& gt; & lt; / p& gt; \\\
+
\\\
+
\\\
+
\\\
+
<< ; / content>;

MySaxHandler handler = new MySaxHandler();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputSource source = new InputSource(new StringReader(xml));
parser.parse(source,handler);
}

private static class MySaxHandler extends DefaultHandler {
private StringBuilder content = new StringBuilder();

@Override $ b $ public void startElement(String uri,String localName,String qName,Attributes attributes)throws SAXException {
content。 setLength(0);
}
$ b @Override
public void characters(char [] ch,int start,int length)throws SAXException {
content.append(ch,start,length);

$ b @Override
public void endElement(String uri,String localName,String qName)throws SAXException {
System.out.println(content.toString() );
}

}
}

输出:

 < img alt =src =http://cdn2.sbnation.com/entry_photo_images/8767829/stranger- bad-robot-screencap_large.png/> 


坏机器人,由极客文化制造商J.J.创立的制作公司。 Abrams(i Lost,i fringe i,i i Star Trek:Into Darkness< i>,Alias< i>和 等)发布了一个< a href =http://youtu.be/FWaAZCaQXdotarget =_ blank>神秘的新预告片< / a>标题为陌生人。今天下午官方的坏机器人Twitter帐户发布的令人毛骨悚然和不可思议的视频点呈现星空;一个长发,绳索的男人沿着荒凉的单色岸线漫步;和你的花园品种,可怕的缝合口的人开始关注。 男人被抹去和重生,一个听起来有点像Leonard Nimoy的叙述者。< / p>
< p>< / p>


Right now I'm using SAXParser with my own handler, it can parse all node values except for the one that has type="html"

My characters function is like this:

public void characters(char ch[], int start, int length) throws SAXException {
        if(content){
        String tmp = new String(ch, start, length);
        System.out.println("Content : " + tmp);
        content = false;
        }

And that particular node has the following format, which my output always just give me a bunch of \n and nothing else.

   <content type="html">

    &lt;img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" /&gt;


     &lt;p&gt;Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (&lt;i&gt;Lost&lt;/i&gt;, &lt;i&gt;Fringe&lt;/i&gt;, &lt;i&gt;Star Trek: Into Darkness&lt;/i&gt;, &lt;i&gt;Alias&lt;/i&gt;,&amp;nbsp;etc.), has released a&amp;nbsp;&lt;a href="http://youtu.be/FWaAZCaQXdo" target="_blank"&gt;mysterious new trailer&lt;/a&gt; titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.&lt;/p&gt;
     &lt;p&gt;&lt;/p&gt;



    </content>

解决方案

You might be wrongfully assuming that the characters callback occurs only once in between startElement and endElement callbacks. It is actually called multiple times.

Since you use the content boolean member to determine whether to print stuff or not and also set this same member to false inside characters callback, your condition is bound to be fulfilled only once, until you reset content (it is not clear where you do that).

Here's an example that works with your XML just fine (assumes non-mixed content and Java programming language):

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSaxParser {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        String xml = 
            "<content type=\"html\">\n" +
            "\n" +
            "    &lt;img alt=\"\" src=\"http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png\" /&gt;\n" +
            "\n" +
            "\n" +
            "     &lt;p&gt;Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (&lt;i&gt;Lost&lt;/i&gt;, &lt;i&gt;Fringe&lt;/i&gt;, &lt;i&gt;Star Trek: Into Darkness&lt;/i&gt;, &lt;i&gt;Alias&lt;/i&gt;,&amp;nbsp;etc.), has released a&amp;nbsp;&lt;a href=\"http://youtu.be/FWaAZCaQXdo\" target=\"_blank\"&gt;mysterious new trailer&lt;/a&gt; titled \"Stranger.\" The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. \"Men are erased and reborn,\" intones a narrator that sounds a little like Leonard Nimoy.&lt;/p&gt;\n" +
            "     &lt;p&gt;&lt;/p&gt;\n" +
            "\n" +
            "\n" +
            "\n" +
            "    </content>";

        MySaxHandler handler = new MySaxHandler();
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();        
        InputSource source = new InputSource(new StringReader(xml));
        parser.parse(source, handler);
    }

    private static class MySaxHandler extends DefaultHandler {
        private StringBuilder content = new StringBuilder();

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            content.setLength(0);
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            content.append(ch, start, length);
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            System.out.println(content.toString());
        }

    }    
}

Output:

    <img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" />


     <p>Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (<i>Lost</i>, <i>Fringe</i>, <i>Star Trek: Into Darkness</i>, <i>Alias</i>,&nbsp;etc.), has released a&nbsp;<a href="http://youtu.be/FWaAZCaQXdo" target="_blank">mysterious new trailer</a> titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.</p>
     <p></p>

这篇关于当节点内部文本为html时,Java解析xml文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆