提取网站的一部分 [英] Extraction part of the site

查看:122
本文介绍了提取网站的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好b $ b如何提取网站的一部分并在网络浏览器中显示? (并且必须每1小时更新一次)

我该怎么办?

请帮助我

尊重太空人

解决方案

这不是Web的工作原理。没有部分这样的东西。在一般情况下,部分的概念是未定义的。



如果您只考虑100%静态网站,可以认为是一般情况图表哪些节点是文件(资源),它们与链接(例如锚点)相互连接。然后你可以定义这些文件集的一些sun-set,并使用 Web scraping 的技术提取它们( http://en.wikipedia.org/wiki/Web_scraping [ ^ ])。



但问题是:1)链接不必指向同一站点上的资源;什么是同一个网站并不总是很明显; 2)在开始抓取之前,你不提前知道节点集; 3)站点不必是静态的:可以根据请求生成资源;并且在该组URL和该组资源之间没有任何预定义的对应关系; 4)即使报废相同的URL也可以每次给出不同的(可能是随机的)结果,例如游戏就是这种情况。



所以,只有对于一些特殊的简单情况,当你在一开始就知道了一组URL时,你可以给出一些合理的部分定义并实际废弃它。怎么做?请查看我过去的答案:

获取具体信息来自网页的数据 [ ^ ],

如何从其他网站获取数据 [ ^ ]。



-SA

Hi how can i extract a part of a website and show it in web browser? (and it must be update every 1 hour)
what should i do?
please help me
With Respect "Spaceman"

解决方案

This is not exactly how the Web works. There is no such thing as "part". In general case, the concept of "part" is something undefined.

If you considered only a 100% static Web site, it could be considered is a general-case graph which nodes are files ("resources"), which are inter-connected with links (such as anchors). Then you could define some sun-set of the set of those files and extract them using the techniques of Web scraping (http://en.wikipedia.org/wiki/Web_scraping[^]).

But the trouble is: 1) the links don't have to point to the resources on the same site; and it's not always obvious what is "the same site"; 2) you don't know the set of nodes in advance, before you start scraping; 3) the site does not have to be static: resources can be generated on request; and there is no any predefined correspondence between the set of URLs and the set of resources; 4) even scrapping of the same URL can give different (possibly random) results each time, which is the case for, for example, the games.

So, only for some special simple case, when you have the set of URLs known in the very beginning, you can give some reasonable definition of the "part" and actually scrap it. How to do it? Please see my past answers:
get specific data from web page[^],
How to get the data from another site[^].

—SA


这篇关于提取网站的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆