Android的:如何下载RSS时,网站包含:链接相对="替代"类型="应用/ RSS + XML" [英] Android: How to download RSS when a website contains: link rel="alternate" type="application/rss+xml"

查看:117
本文介绍了Android的:如何下载RSS时,网站包含:链接相对="替代"类型="应用/ RSS + XML"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提出一个RSS相关应用程序。
我希望能够下载RSS(XML)只给出了网站的网址,它包含:

I am making a RSS related app.
I want to be able to download RSS(xml) given only website URL that contains:

链接相对=交替式=申请/ RSS + XML

例如, http://www.engaget.com 的来源包括:

<link rel="alternate" type="application/rss+xml" title="Engadget" href="http://www.engadget.com/rss.xml">

我猜想,如果我打开这个网站的RSS应用程序,
将重新直接我 http://www.engadget.com/rss.xml 页。

我的code下载XML是以下内容:

My code to download xml is following:

private boolean downloadXml(String url, String filename) {
        try {
            URL   urlxml = new URL(url);
            URLConnection ucon = urlxml.openConnection();
            ucon.setConnectTimeout(4000);
            ucon.setReadTimeout(4000);
            InputStream is = ucon.getInputStream();
            BufferedInputStream bis = new BufferedInputStream(is, 128);
            FileOutputStream fOut = openFileOutput(filename + ".xml", Context.MODE_WORLD_READABLE | Context.MODE_WORLD_WRITEABLE);
            OutputStreamWriter osw = new OutputStreamWriter(fOut);
            int current = 0;
            while ((current = bis.read()) != -1) {
                osw.write((byte) current);
            }
            osw.flush();
            osw.close();

        } catch (Exception e) {
            return false;
        }
        return true;
    }

我不知道'http://www.engadget.com/rss.xml的网址,我怎么能下载RSS,当我输入'http://www.engadget.com?

without me knowing 'http://www.engadget.com/rss.xml' url, how can I download RSS when I input 'http://www.engadget.com"?

推荐答案

要做到这一点,你需要:

To accomplish this, you need to:

  1. 检测是否该URL指向一个HTML文件。请参见code以下的 isHtml 方法。
  2. 如果该URL指向一个HTML文件,从中提取一个RSS的URL。请参见code以下的 extractRssUrl 方法。
  1. Detect whether the URL points to an HTML file. See the isHtml method in the code below.
  2. If the URL points to an HTML file, extract an RSS URL from it. See the extractRssUrl method in the code below.

下面code是你在你的问题粘贴在code修改后的版本。对于I / O,我用的Apache下议院IO 获取有用的 IOUtils 文件实用类。 IOUtils.toString 用于输入流转换为字符串,所建议的文章<一href="http://stackoverflow.com/questions/309424/in-java-how-do-i-read-convert-an-inputstream-to-a-string">In Java中,我如何读/转换一个InputStream为String?

The following code is a modified version of the code you pasted in your question. For I/O, I used Apache Commons IO for the useful IOUtils and FileUtils classes. IOUtils.toString is used to convert an input stream to a string, as recommended in the article "In Java, how do I read/convert an InputStream to a String?"

extractRssUrl 使用常规EX pressions解析HTML,尽管它是非常令人难以接受的。 (见在咆哮<一href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags">RegEx符合开放式标签,除了XHTML自包含标记。),考虑到这一点,让 extractRssUrl 是一个起点。在常规EX pression extractRssUrl 是基本的,不包括所有的情况。

extractRssUrl uses regular expressions to parse HTML, even though it is highly frowned upon. (See the rant in "RegEx match open tags except XHTML self-contained tags.") With this in mind, let extractRssUrl be a starting point. The regular expression in extractRssUrl is rudimentary and doesn't cover all cases.

请注意,一个呼叫 isRss(STR)被注释掉了。如果你想要做的RSS检测,请参阅的如何检测如果一个网页是RSS或ATOM供稿

Note that a call to isRss(str) is commented out. If you want to do RSS detection, see "How to detect if a page is an RSS or ATOM feed."

private boolean downloadXml(String url, String filename) {
    InputStream is = null;
    try {
        URL urlxml = new URL(url);
        URLConnection ucon = urlxml.openConnection();
        ucon.setConnectTimeout(4000);
        ucon.setReadTimeout(4000);
        is = ucon.getInputStream();
        String str = IOUtils.toString(is, "UTF-8");
        if (isHtml(str)) {
            String rssURL = extractRssUrl(str);
            if (rssURL != null && !url.equals(rssURL)) {
                return downloadXml(rssURL, filename + ".xml");
            }
        } else { // if (isRss(str)) {
            // For now, we'll assume that we're an RSS feed at this point
            FileUtils.write(new File(filename), str);
            return true;
        }
    } catch (Exception e) {
        // do nothing
    } finally {
        IOUtils.closeQuietly(is);
    }
    return false;
}

private boolean isHtml(String str) {
    Pattern pattern = Pattern.compile("<html", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(str);
    return matcher.find();
}

private String extractRssUrl(String str) {
    Pattern pattern = Pattern.compile("<link(?:\\s+href=\"([^\"]*)\"|\\s+[a-z\\-]+=\"[^\"]*\")*\\s+type=\"application/rss\\+(?:xml|atom)\"(?:\\s+href=\"([^\"]*)\"|\\s+[a-z\\-]+=\"[^\"]*\")*?\\s*/?>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(str);
    if (matcher.find()) {
        for (int i = 1; i <= matcher.groupCount(); i++) {
            if (matcher.group(i) != null) {
                return matcher.group(i);
            }
        }
    }
    return null;
}

以上code与您的Engadget的例子:

The above code works with your Engadget example:

obj.downloadXml("http://www.engadget.com/", "rss");

这篇关于Android的:如何下载RSS时,网站包含:链接相对=&QUOT;替代&QUOT;类型=&QUOT;应用/ RSS + XML&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆