Android的:如何下载RSS时,网站包含:链接相对="替代"类型="应用/ RSS + XML" [英] Android: How to download RSS when a website contains: link rel="alternate" type="application/rss+xml"
问题描述
我想提出一个RSS相关应用程序。
我希望能够下载RSS(XML)只给出了网站的网址,它包含:
I am making a RSS related app.
I want to be able to download RSS(xml) given only website URL that contains:
链接相对=交替式=申请/ RSS + XML的
例如, http://www.engaget.com 的来源包括:
<link rel="alternate" type="application/rss+xml" title="Engadget" href="http://www.engadget.com/rss.xml">
我猜想,如果我打开这个网站的RSS应用程序,
将重新直接我 http://www.engadget.com/rss.xml 页。
我的code下载XML是以下内容:
My code to download xml is following:
private boolean downloadXml(String url, String filename) {
try {
URL urlxml = new URL(url);
URLConnection ucon = urlxml.openConnection();
ucon.setConnectTimeout(4000);
ucon.setReadTimeout(4000);
InputStream is = ucon.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is, 128);
FileOutputStream fOut = openFileOutput(filename + ".xml", Context.MODE_WORLD_READABLE | Context.MODE_WORLD_WRITEABLE);
OutputStreamWriter osw = new OutputStreamWriter(fOut);
int current = 0;
while ((current = bis.read()) != -1) {
osw.write((byte) current);
}
osw.flush();
osw.close();
} catch (Exception e) {
return false;
}
return true;
}
我不知道'http://www.engadget.com/rss.xml的网址,我怎么能下载RSS,当我输入'http://www.engadget.com?
without me knowing 'http://www.engadget.com/rss.xml' url, how can I download RSS when I input 'http://www.engadget.com"?
推荐答案
要做到这一点,你需要:
To accomplish this, you need to:
- 检测是否该URL指向一个HTML文件。请参见code以下的
isHtml
方法。 - 如果该URL指向一个HTML文件,从中提取一个RSS的URL。请参见code以下的
extractRssUrl
方法。
- Detect whether the URL points to an HTML file. See the
isHtml
method in the code below. - If the URL points to an HTML file, extract an RSS URL from it. See the
extractRssUrl
method in the code below.
下面code是你在你的问题粘贴在code修改后的版本。对于I / O,我用的Apache下议院IO 获取有用的 IOUtils
和文件实用
类。 IOUtils.toString
用于输入流转换为字符串,所建议的文章<一href="http://stackoverflow.com/questions/309424/in-java-how-do-i-read-convert-an-inputstream-to-a-string">In Java中,我如何读/转换一个InputStream为String?
The following code is a modified version of the code you pasted in your question. For I/O, I used Apache Commons IO for the useful IOUtils
and FileUtils
classes. IOUtils.toString
is used to convert an input stream to a string, as recommended in the article "In Java, how do I read/convert an InputStream to a String?"
extractRssUrl
使用常规EX pressions解析HTML,尽管它是非常令人难以接受的。 (见在咆哮<一href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags">RegEx符合开放式标签,除了XHTML自包含标记。),考虑到这一点,让 extractRssUrl
是一个起点。在常规EX pression extractRssUrl
是基本的,不包括所有的情况。
extractRssUrl
uses regular expressions to parse HTML, even though it is highly frowned upon. (See the rant in "RegEx match open tags except XHTML self-contained tags.") With this in mind, let extractRssUrl
be a starting point. The regular expression in extractRssUrl
is rudimentary and doesn't cover all cases.
请注意,一个呼叫 isRss(STR)
被注释掉了。如果你想要做的RSS检测,请参阅的如何检测如果一个网页是RSS或ATOM供稿。
Note that a call to isRss(str)
is commented out. If you want to do RSS detection, see "How to detect if a page is an RSS or ATOM feed."
private boolean downloadXml(String url, String filename) {
InputStream is = null;
try {
URL urlxml = new URL(url);
URLConnection ucon = urlxml.openConnection();
ucon.setConnectTimeout(4000);
ucon.setReadTimeout(4000);
is = ucon.getInputStream();
String str = IOUtils.toString(is, "UTF-8");
if (isHtml(str)) {
String rssURL = extractRssUrl(str);
if (rssURL != null && !url.equals(rssURL)) {
return downloadXml(rssURL, filename + ".xml");
}
} else { // if (isRss(str)) {
// For now, we'll assume that we're an RSS feed at this point
FileUtils.write(new File(filename), str);
return true;
}
} catch (Exception e) {
// do nothing
} finally {
IOUtils.closeQuietly(is);
}
return false;
}
private boolean isHtml(String str) {
Pattern pattern = Pattern.compile("<html", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(str);
return matcher.find();
}
private String extractRssUrl(String str) {
Pattern pattern = Pattern.compile("<link(?:\\s+href=\"([^\"]*)\"|\\s+[a-z\\-]+=\"[^\"]*\")*\\s+type=\"application/rss\\+(?:xml|atom)\"(?:\\s+href=\"([^\"]*)\"|\\s+[a-z\\-]+=\"[^\"]*\")*?\\s*/?>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
if (matcher.group(i) != null) {
return matcher.group(i);
}
}
}
return null;
}
以上code与您的Engadget的例子:
The above code works with your Engadget example:
obj.downloadXml("http://www.engadget.com/", "rss");
这篇关于Android的:如何下载RSS时,网站包含:链接相对=&QUOT;替代&QUOT;类型=&QUOT;应用/ RSS + XML&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!