如何从网站获取更新? [英] how To get Updates form a website?

查看:83
本文介绍了如何从网站获取更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个应用程序,可以从我的最爱网站中获取更新..
但是有些站点没有rss提要,那么如何从url

I want to create a application from which i can get the updates from my fav websites..
but some sites does''t have rss feeds so how can i generate their feeds from url

推荐答案

没有RSS提要或根本没有提要来生成它们的提要?情况很简单:没有更新提要-没有更新信息.剩下的措施将是扫描整个网站,这在原则上是可行的(毕竟Google搜寻器可以这样做),但看起来并不实用.

可以扫描整个站点,但有数量限制.即使不更新代码的服务部分,某些页面的内容也可以是动态的,因此,原则上无法检测到该更新,而且原则上,站点图也可以断开连接( http://en.wikipedia.org/wiki/Connected_space [ http://en.wikipedia.org/wiki/Web_scraping [如何从另一个站点获取数据 [ ^ ],
从网页中获取特定数据 [
No RSS feed or no feed at all? The situation is simple: no update feed — no update information. The remaining measure would be scanning the whole site, which is in principle can be feasible (Google crawler does, after all) but does not look very practical.

Scanning of the whole site is possible with the number of limitation. The content of some page can be dynamic even when the service part of the code is not updated, so the update is not detectable in principle, also, in principle, the site graph can be disconnected (http://en.wikipedia.org/wiki/Connected_space[^]), so there is no a regular way of discovery of the URLs not reachable from the head page.

If these considerations do not look too discouraging to you, you can learn and go in for Web scraping, see:
http://en.wikipedia.org/wiki/Web_scraping[^].

In my past answers:
How to get the data from another site[^],
get specific data from web page[^].

—SA


这篇关于如何从网站获取更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆