Scrapy爬行速度很慢(60页/分钟) [英] Scrapy Crawling Speed is Slow (60 pages / min)

查看:541
本文介绍了Scrapy爬行速度很慢(60页/分钟)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的抓取速度很慢(大约1页/秒). 我正在从aws服务器爬网一个主要的网站,所以我认为这不是网络问题. Cpu利用率远没有达到100,如果我启动多个抓取进程,爬网速度会更快.

I am experiencing slow crawl speeds with scrapy (around 1 page / sec). I'm crawling a major website from aws servers so I don't think its a network issue. Cpu utilization is nowhere near 100 and if I start multiple scrapy processes crawl speed is much faster.

Scrapy似乎会爬行一堆页面,然后挂几秒钟,然后重复.

Scrapy seems to crawl a bunch of pages, then hangs for several seconds, and then repeats.

我试着玩: CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_DOMAIN = 500

I've tried playing with: CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_DOMAIN = 500

但是这似乎并不能使针头经过20左右.

but this doesn't really seem to move the needle past about 20.

推荐答案

确定要允许您对目标站点进行高速爬网吗?许多站点都实现了下载阈值,并且过了一会儿"开始响应缓慢.

Are you sure you are allowed to crawl the destination site at high speed? Many sites implement download threshold and "after a while" start responding slowly.

这篇关于Scrapy爬行速度很慢(60页/分钟)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆