RSS阅读器的工作原理(网络,Google阅读器...) [英] How large RSS reader works (netvibes, Google reader...)

查看:115
本文介绍了RSS阅读器的工作原理(网络,Google阅读器...)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道Google阅读器,Blogline,techronati等Web应用程序如何工作,以及它们遵循什么技术来一次使用cron job解析数百万个RSS feed?

I wonder how web applications like Google Reader, Blogline, techronati works, and what technics they follow to parse millions of RSS feeds using cron job at one time?

推荐答案

很多种不同的技术……最糟糕"的就是您所描述的一种. (基于时间的轮询).

There is a lot of different techniques... the "worst" one being the one that you describe. (time based polling).

您需要考虑的第一件事是,它们可能并非全部都在服务器端进行解析.例如,我知道Netvibes在客户端进行解析(但是将内容缓存在服务器上),因此节省了很多资源.这样,他们将仅在用户询问时对他们进行轮询,因此无需他们进行某种时间循环.

The first thing you need to consider is that they may not all do the parsing on the server side. For example, I know that Netvibes was doing the parsing on the client side (but cached the content on the server), so it saved them a lot of resources. This way they would poll feeds only when users asked from them, so there is no need for them to run some kind of time loop.

基于时间的轮询仍然是最常用的解决方案.有很多技术可以确定何时进行轮询的最佳时间.根据过去更新的频率,根据已登记的用户数量等.这些人也可以使用旧的XML-RPC ping服务器.

Time based polling is still, unfortunately the most frequent solution. There are a lot of techniques to determine when is the best time to do a poll. Based on the frequency of past updates, based on the number of users who susbcribed... etc. The old XML-RPC ping servers can also be used by these guys.

最有效的技术是使用 PubSubHubbub ,这是Google Reader,Netvibes和数以千计的其他应用程序(例如Digg.com,Twitterfeed,Friendfeed ...).它是一种开放协议,允许供稿发布者将供稿的内容直接推送到订阅应用程序.它非常有效,但是需要发布者来实现它.偶然地,所有大型博客平台(Tumblr,Posterous,Wordpress,Blogger,SixApart等)都实现了它.其他供稿发布应用程序(例如feedburner,Gowalla等)也实现了它.如果您确实发布了供稿,则我鼓励加入这一人群,如果您打算消费一些,也请实施替代方.

The most efficient technique is to use PubSubHubbub, which is a open protocol used by Google Reader, Netvibes and a few thousand other apps (like Digg.com, Twitterfeed, Friendfeed...). It's an open protocol that allows the feed publisher to directly push the content of the feed to subscribing applications. It's very efficient, but requires the publisher to implement it. By chance, all the big blogging platforms (Tumblr, Posterous, Wordpress, Blogger, SixApart... etc) have implemented it. Other feed publishing apps (like feedburner, Gowalla, ...) also implemented it. If you do publish feeds, I would encourage joining this crowd, and if you plan on consuming some, please, implement the susbcriber side as well.

最后一个解决方案是使用第三方应用程序执行此数据收集(使用上述所有技术)并在这些提要实际上包含新内容时对您执行ping操作.我创建了一个: Superfeedr ,我相信我们在此方面做得很好.我们还将内容标准化,并做一些其他事情来帮助您以最简单,最便宜的方式使用Feed数据(轮询可能会非常昂贵).另外,我们使用完全相同的PubSubHubbub协议来推送任何供稿中的内容,这使得我们的用户除了订阅可用的集线器之外,还很容易使用我们的服务.

The last solution is to use a 3rd party application do this data gathering (using all the techniques above) and ping you when these feeds actually have new content. I created one : Superfeedr and I believe we do a good job with this. We also normalize the content and do a few other things to help you consume feed data in the simplest and cheap way (polling can be crazy expensive). Also, we use the exact same PubSubHubbub protocol to push content from any feed, which makes it very simple for our users to use our service in addition to subscribing to available hubs.

此外,我应该补充一点,就是我能够快速回复您的问题,因为我使用了一款应用程序,可以向我推送供稿中包含RSS的问题的内容:)

Also, I should add that I was able to reply quickly to your question, because I use an app that pushes me the content of the feed for questions tagged RSS :)

这篇关于RSS阅读器的工作原理(网络,Google阅读器...)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆