使用XML解析器拉的XML解析CDATA节 [英] Parsing the CDATA section in XML using XML Pull Parser

查看:282
本文介绍了使用XML解析器拉的XML解析CDATA节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例XML

<feed xmlns="http://www.w3.org/2005/Atom">
    <title>NDTV News - Top Stories</title>
    <link>http://www.ndtv.com/</link>
    <description>Latest entries</description>
    <language>en</language>
    <pubDate>Wed, 31 Jul 2013 22:33:00 GMT</pubDate>
    <lastBuildDate>Wed, 31 Jul 2013 22:33:00 GMT</lastBuildDate>
    <entry>
    <title>Narendra Modi to be BJP's PM candidate, announcement before crucial assembly polls: sources</title>
    <link>http://feedproxy.google.com/~r/NdtvNews-TopStories/~3/XN7dMIDe5YI/story01.htm</link>
    <published>Wed, 31 Jul 2013 13:58:31 GMT</published>
    <author>
    <name>user42715</name>
    </author>
    <content type="html"><![CDATA[<div align="center"><a href="http://www.ndtv.com/news/images/topstory_thumbnail/  Shatrughan_Sinha_agency_120.jpg"><img border="0" src="http://www.ndtv.com/news/images/topstory_thumbnail/Shatrughan_Sinha_agency_120.jpg" alt="2013-07-29-08-43-05" /></a></div><p><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.</span><br /><br /><span style="font-size: large;"> The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September. </span><br /><br /><span style="font-size: large;">The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.   </span><br /><br /></p>]]></content>
   </entry>
</feed>

随着标签内低于code,我能够检索和价值观。

With the below code I was able to retrieve , and values within the tag.

XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        private XmlPullParser parser = factory.newPullParser();
        private InputStream urlStream = downloadUrl(urlString);
        parser.setInput(urlStream, null);
        int eventType = parser.getEventType();
        boolean done = false;

        while (eventType != XmlPullParser.END_DOCUMENT && !done) {
            tagName = parser.getName();

            switch (eventType) {
            case XmlPullParser.START_DOCUMENT:                  
                break;
            case XmlPullParser.START_TAG:
                if (tagName.equals("entry")) {                      
                }
                if (tagName.equals("title")) {
                    title = parser.nextText().toString();
                    Log.i(TITLE, title);
                }
                if (tagName.equals("published")) {
                    pubDate = parser.nextText().toString();
                    Log.i(PUBLISHEDDATE, pubDate);
                }

                if (tagName.equals("author")) {
                    readAuthor(parser);
                    Log.i(AUTHOR, author);
                }

                break;
            case XmlPullParser.END_TAG:
                if (tagName.equals("feed")) {
                    done = true;
                } else if (tagName.equals("entry")) {

                    rssFeed = new RssFeedStructure(title);
                    rssFeedList.add(rssFeed);
                }
                break;
            }
            eventType = parser.next();
        }

        private String readAuthor(XmlPullParser parser) throws IOException,
            XmlPullParserException {
            parser.nextTag();
            parser.require(XmlPullParser.START_TAG, null, "name");
            author = parser.nextText().toString();
            parser.require(XmlPullParser.END_TAG, null, "name");
            return author;
        }

这是我怎么可以检索中的HREF值和文本值的标签(人民党可能膏纳伦德拉·莫迪.....)从

标签。

From the tag how can I retrieve the "href" value within the and the text value(The BJP is likely to anoint Narendra Modi.....) from the

tag.

推荐答案

您可以使用JSoup。下载@ http://jsoup.org/download 。罐子添加到库文件夹。

You can use JSoup. Download @ http://jsoup.org/download. Add the jar to the libs folder.

要解析器我复制的RSS源XML文件中的资产产生的文件夹。 (localy)

To parser i copied the rss feed to xml file in assests folder. (localy)

XmlPullParser xpp = factory.newPullParser();
InputStream is = this.getAssets().open("xmlparser.xml");
xpp.setInput(is, "UTF_8");

您可以使用下面的,因为你的URL。我AVE展示了如何提取URL和内容。你需要提取的其他变量的内容,你通常会做的。

You can use the below since you have the url. I ave shown how to extract the url and the content. you need to extract the contents of other tags as you would do normally.

  XmlPullParser xpp = factory.newPullParser();

    xpp.setInput(urlStream, null);

    boolean insideItem = false;

    // Returns the type of current event: START_TAG, END_TAG, etc..
    int eventType = xpp.getEventType();
    while (eventType != XmlPullParser.END_DOCUMENT) {
        if (eventType == XmlPullParser.START_TAG) {

            if (xpp.getName().equalsIgnoreCase("entry")) {
                insideItem = true;
            }
             else if (xpp.getName().equalsIgnoreCase("content")) {
                    if (insideItem)
                    {
                        Document doc = Jsoup.parse(xpp.nextText());

                        Elements links = doc.select("a[href]"); // a with href
                          for (Element link : links) {
                                Log.i("........",""+link.attr("abs:href"));
                            }

                        Element divcontent = doc.select("span").first();

                        Log.i("..........",""+divcontent.text());

                    }
                }
        } else if (eventType == XmlPullParser.END_TAG
                && xpp.getName().equalsIgnoreCase("entry")) {
            insideItem = false;
        }

        eventType = xpp.next(); // move to next element
    }

} catch (MalformedURLException e) {
    e.printStackTrace();
} catch (XmlPullParserException e1) {
    e1.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
}

日志:

08-03 08:03:04.413: I/........(1524): http://www.ndtv.com/news/images/topstory_thumbnail/   Shatrughan_Sinha_agency_120.jpg
08-03 08:03:04.423: I/..........(1524): The BJP is likely to anoint Narendra Modi as its prime ministerial candidate for the 2014 elections and make a formal announcement to that effect by September.

编辑:遍历元素

To loop through the elements

Elements divcontent = doc.select("span");
for(int k= 1;k<divcontent.size();k++)
{
     String spancontent =divcontent.get(k).text();
     Log.i("..........",spancontent);
}

这篇关于使用XML解析器拉的XML解析CDATA节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆