Feedparser-从Google Reader检索旧邮件 [英] Feedparser - retrieve old messages from Google Reader

查看：104 发布时间：2020/6/17 18:50:13 python rss google-reader feedparser

本文介绍了Feedparser-从Google Reader检索旧邮件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用python中的feedparser库从本地报纸中检索新闻(我的意图是通过该语料库进行自然语言处理)，并希望能够从RSS feed中检索许多过去的条目.

I'm using the feedparser library in python to retrieve news from a local newspaper (my intent is to do Natural Language Processing over this corpus) and would like to be able to retrieve many past entries from the RSS feed.

我对RSS的技术问题不太了解，但我认为应该可行(例如，我看到Google Reader和Feedly可以在移动滚动条时按需"执行此操作) .

I'm not very acquainted with the technical issues of RSS, but I think this should be possible (I can see that, e.g., Google Reader and Feedly can do this ''on demand'' as I move the scrollbar).

当我执行以下操作时:

import feedparser

url = 'http://feeds.folha.uol.com.br/folha/emcimadahora/rss091.xml'
feed = feedparser.parse(url)
for post in feed.entries:
   title = post.title

我只有十几个条目.我当时在想几百个.如果可能的话，也许是上个月的所有条目.只能使用feedparser来做到这一点吗?

I get only a dozen entries or so. I was thinking about hundreds. Maybe all entries in the last month, if possible. Is it possible to do this only with feedparser?

我打算从rss feed中仅获取新闻项的链接，并使用BeautifulSoup解析整个页面以获得我想要的文本.另一种解决方案是使用爬虫，该爬虫跟随页面中的所有本地链接以获取许多新闻，但我现在暂时避免这样做.

I intend to get from the rss feed only the link to the news item and parse the full page with BeautifulSoup to obtain the text I want. An alternate solution would be a crawler that follows all local links in the page to get a lot of news items, but I want to avoid that for now.

出现的一种解决方案是使用Google Reader RSS缓存:

One solution that appeared is to use the Google Reader RSS cache:

http://www.google.com/reader/atom/feed/http://feeds.folha.uol.com.br/folha/emcimadahora/rss091.xml?n=1000

但是要访问此页面，我必须登录Google Reader.有人知道我该怎么用python吗? (我真的对Web一无所知，我通常只会弄乱数值演算).

But to access this I must be logged in to Google Reader. Anyone knows how I do that from python? (I really don't know a thing about web, I usually only mess with numerical calculus).

Feedparser-从Google Reader检索旧邮件 [英] Feedparser - retrieve old messages from Google Reader

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Feedparser-从Google Reader检索旧邮件 [英] Feedparser - retrieve old messages from Google Reader

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭