网络抓取和网络抓取有什么区别? [英] What is the difference between web-crawling and web-scraping?

查看:95
本文介绍了网络抓取和网络抓取有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Crawling 和 Web-scraping 有区别吗?

Is there a difference between Crawling and Web-scraping?

如果存在差异,为了收集一些网络数据以提供数据库供以后在自定义搜索引擎中使用,最好使用什么方法?

If there's a difference, what's the best method to use in order to collect some web data to supply a database for later use in a customised search engine?

推荐答案

爬行本质上是 Google、雅虎、MSN 等公司所做的,寻找任何信息.抓取通常针对某些网站,用于特定数据,例如用于价格比较,因此编码完全不同.

Crawling would be essentially what Google, Yahoo, MSN, etc. do, looking for ANY information. Scraping is generally targeted at certain websites, for specfic data, e.g. for price comparison, so are coded quite differently.

通常,抓取工具会为它应该抓取的网站定制,并且会做(好的)抓取工具不会做的事情,即:

Usually a scraper will be bespoke to the websites it is supposed to be scraping, and would be doing things a (good) crawler wouldn't do, i.e.:

  • 不要考虑 robots.txt
  • 将自己标识为浏览器
  • 提交带有数据的表单
  • 执行 Javascript(如果需要像用户一样)

这篇关于网络抓取和网络抓取有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆