如何最好地开发网络爬虫 [英] How to best develop web crawlers

查看：25 发布时间：2021/9/22 20:28:36 web-crawler

本文介绍了如何最好地开发网络爬虫的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我习惯于创建一些爬虫来编译信息，当我来到一个网站时，我需要这些信息，我启动了一个特定于该站点的新爬虫，大部分时间使用 shell 脚本，有时使用 PHP.

I am used to create some crawlers to compile information and as I come to a website I need the info I start a new crawler specific for that site, using shell scripts most of the time and sometime PHP.

我的做法是用一个简单的for来迭代页面列表，一个wget下载它并sed，tr、awk 或其他实用程序来清理页面并获取我需要的特定信息.

The way I do is with a simple for to iterate for the page list, a wget do download it and sed, tr, awk or other utilities to clean the page and grab the specific info I need.

所有过程都需要一些时间，具体取决于站点，下载所有页面需要更多时间.我经常进入一个让一切都复杂化的 AJAX 网站

All the process takes some time depending on the site and more to download all pages. And I often steps into an AJAX site that complicates everything

我想知道是否有更好的方法、更快的方法或什至一些应用程序或语言来帮助此类工作.

I was wondering if there is better ways to do that, faster ways or even some applications or languages to help such work.

如何最好地开发网络爬虫 [英] How to best develop web crawlers

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何最好地开发网络爬虫 [英] How to best develop web crawlers

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭