网络爬虫是否应该接受查询？ [英] Should a web-crawler pick up queries?

查看：149 发布时间：2018/6/25 18:18:30 html web-crawler

本文介绍了网络爬虫是否应该接受查询？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近我编写了一个网络爬虫程序。我留下的唯一问题是，标准网络爬虫是否抓取像这样的链接查询：
https://www.google.se/?q=stackoverflow
或是否跳过查询并按如下方式提取：
https://www.google.se

The latest days I have coded a web-crawler. The only question I have left is, does "standard" web-crawlers crawl links queries like this one: https://www.google.se/?q=stackoverflow or does it skip the queries and pick them up like this: https://www.google.se

推荐答案

如果你指的是抓取某种网页资源的索引：

In case you are referring to crawling for some sort of indexing of web resources:

答案很长，但总之我的意见是：
如果你有这个页面/资源： https://www.google.se/?q=stackoverflow指向许多其他页面（即它有一个很大的链接度），然后没有将它集成到你的索引可能意味着你错过了webgraph中非常重要的节点。另一方面，想象一下这种类型的google.com/q=\"query链接有多少在网络上。可能是一个庞大的数字，所以这对你的爬虫/索引器系统来说肯定是一个巨大的开销。

The answer is very long but in short my opinion is that: if you have this "page/resource": https://www.google.se/?q=stackoverflow pointed to by many other pages (i.e. it has a large in-link degree) then not integrating it to your index might mean that you miss a very important node in the webgraph. On the other hand, imagine how many links of this type google.com/q="query" are there on the web. Probably a huge number so this would certainly be a huge overhead for your crawler/indexer system.

这篇关于网络爬虫是否应该接受查询？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网络爬虫是否应该接受查询？ [英] Should a web-crawler pick up queries?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

网络爬虫是否应该接受查询？ [英] Should a web-crawler pick up queries?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭