Scrapy Twisted ConnectionLost 错误 [英] Scrapy Twisted ConnectionLost error
问题描述
我正在学习scrapy,并且很难弄清楚这个问题.我的蜘蛛不会抓取梅西百货的网站并不断抛出以下错误:
I am learning scrapy and am having a hard time trying to figure out this issue. My spider will not crawl the macys website and keeps throwing the following error:
[<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
到目前为止我尝试过的事情:
Things I've tried so far:
- 按照此线程设置标题和 robotstxt:Scrapy Shell:twisted.internet.error.ConnectionLost虽然设置了 USER_AGENT
- 根据此线程更改用户代理:如何防止twisted.internet.使用 Scrapy 时出现 error.ConnectionLost 错误?
- 密码学 <2 每个线程:Scrapy扭曲连接丢失在非-干净的时尚.没有代理.已经尝试过标题
- 猴子补丁:Twisted Python 故障 - Scrapy 问题
我还在命令提示符中检查了scrapy shellwww.macys.com"并得到了同样的错误.所以我猜问题不在于我的蜘蛛.有人可以帮忙吗?
I also checked scrapy shell "www.macys.com" into the command prompt and get the same error. So I'm guessing the issue is not with my spider. Could someone please help?
推荐答案
看来您的 IP 正在启动您的抓取工具已被列入黑名单.
It seems that your IP from you are launching your scraper has been blacklisted.
您可能想阅读以下内容:https://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned
You might want to read the following: https://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned
此外,您可能需要调整设置关于scrapy输出的请求数:CONCURRENT_REQUESTS
、DOWNLOAD_DELAY
等
Also, you might want to tune the settings concerning the number of requests outputted by scrapy: CONCURRENT_REQUESTS
, DOWNLOAD_DELAY
, etc.
这篇关于Scrapy Twisted ConnectionLost 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!