Scrapy或Selenium或Mechanize来抓取Web数据? [英] Scrapy or Selenium or Mechanize to scrape web data?

查看:153
本文介绍了Scrapy或Selenium或Mechanize来抓取Web数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从网站上抓取一些数据.

I want to scrape some data from a website.

基本上,该网站以表格形式显示,并显示约50条记录.对于更多记录,用户必须单击某个按钮,该按钮才能进行ajax调用get&.显示接下来的50条记录.

Basically, the website has some tabular display and shows around 50 records. For more records, the user has to click some button which makes an ajax call get & show the next 50 records.

我以前有Selenium webdriver(Python)的知识.我可以在Selenium中非常快地完成此操作.但是,Selenium更像是一种自动化测试工具,而且速度很慢.

I have previous knowledge of Selenium webdriver(Python). I can do this very quickly in Selenium. But, Selenium is more kind of automation testing tool and it is very slow.

我做了一些研发,发现使用Scrapy或Mechanize,我也可以做同样的事情.

I did some R&D and found that using Scrapy or Mechanize, I can also do the same thing.

我应该为此选择Scrapy或Mechanize或Selenium吗?

Should I go for Scrapy or Mechanize or Selenium for this ?

推荐答案

我建议您结合使用Mechanize和ExecJS( https://github.com/sstephenson/execjs )来执行您可能遇到的所有JavaScript请求.我将这两种宝石组合使用已经有一段时间了,它们做得很好.

I would recommend you to go with a combination of Mechanize and ExecJS (https://github.com/sstephenson/execjs) to execute any javascript requests you might come across. I have used those two gems in combination for quite some time now and they do a great job.

您应该选择它而不是Selenium,因为与必须在无头浏览器中呈现整个页面相比,它要快得多.

You should choose this instead of Selenium, because it it will be a lot faster compared to having to render the entire page in a headless browser.

这篇关于Scrapy或Selenium或Mechanize来抓取Web数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆