无法从XML提取数据 [英] Cannot extract data from an XML

查看:81
本文介绍了无法从XML提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用getElementBytag方法从以下XML文档中提取数据(雅虎财经新闻api http://finance.yahoo.com/rss/topfinstories )

Im using getElementBytag method to extract data from the following an XML document(Yahoo finance news api http://finance.yahoo.com/rss/topfinstories)

我使用以下代码.它使用getelementsBytag方法获取新项目,标题没有问题,但是由于某种原因,当通过tag搜索时,它不会选择链接.它仅拾取link元素的结束标记.是XML文档有问题还是jsoup有问题?

Im using the following code . It gets the new items and the title's no problem using the getelementsBytag method but for some reason wont pick up the link when searched by tag. It only picks up the closing tag for the link element. Is it a problem with the XML document or a problem with jsoup?

import java.io.IOException;         
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;   

class GetNewsXML {
    /**
     * @param args
     */
    /**
     * @param args
     */
    public static void main(String args[]){
        Document doc = null;
        String con = "http://finance.yahoo.com/rss/topfinstories";
        try {
            doc = Jsoup.connect(con).get();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        Elements collection = doc.getElementsByTag("item");// Gets each news item
        for (Element c: collection){
            System.out.println(c.getElementsByTag("title"));
        }
        for (Element c: collection){
            System.out.println(c.getElementsByTag("link"));
        }
    }

推荐答案

您得到<link /> http://...;链接被放置在link标记的之后作为文本节点.

You get <link /> http://...; the link is put after the link-tag as a textnode.

但这不是问题:

final String url = "http://finance.yahoo.com/rss/topfinstories";

Document doc = Jsoup.connect(url).get();


for( Element item : doc.select("item") )
{
    final String title = item.select("title").first().text();
    final String description = item.select("description").first().text();
    final String link = item.select("link").first().nextSibling().toString();

    System.out.println(title);
    System.out.println(description);
    System.out.println(link);
    System.out.println("");
}

说明:

item.select("link")  // Select the 'link' element of the item
    .first()         // Retrieve the first Element found (since there's only one)
    .nextSibling()   // Get the next Sibling after the one found; its the TextNode with the real URL
    .toString()      // Get it as a String

通过您的链接,此示例将打印出所有如下所示的元素:

With your link this example prints all elements like this:

Tax Day Freebies and Deals
You made it through tax season. Reward yourself by taking advantage of some special deals on April 15.
http://us.rd.yahoo.com/finance/news/rss/story/SIG=14eetvku9/*http%3A//us.rd.yahoo.com/finance/news/topfinstories/SIG=12btdp321/*http%3A//finance.yahoo.com/news/tax-day-freebies-and-deals-133544366.html?l=1

(...)

这篇关于无法从XML提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆