Scrapy Twisted ConnectionLost 错误 [英] Scrapy Twisted ConnectionLost error

查看:75
本文介绍了Scrapy Twisted ConnectionLost 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习scrapy,并且很难弄清楚这个问题.我的蜘蛛不会抓取梅西百货的网站并不断抛出以下错误:

I am learning scrapy and am having a hard time trying to figure out this issue. My spider will not crawl the macys website and keeps throwing the following error:

[<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

到目前为止我尝试过的事情:

Things I've tried so far:

  1. 按照此线程设置标题和 robotstxt:Scrapy Shell:twisted.internet.error.ConnectionLost虽然设置了 USER_AGENT
  2. 根据此线程更改用户代理:如何防止twisted.internet.使用 Scrapy 时出现 error.ConnectionLost 错误?
  3. 密码学 <2 每个线程:Scrapy扭曲连接丢失在非-干净的时尚.没有代理.已经尝试过标题
  4. 猴子补丁:Twisted Python 故障 - Scrapy 问题

我还在命令提示符中检查了scrapy shellwww.macys.com"并得到了同样的错误.所以我猜问题不在于我的蜘蛛.有人可以帮忙吗?

I also checked scrapy shell "www.macys.com" into the command prompt and get the same error. So I'm guessing the issue is not with my spider. Could someone please help?

推荐答案

看来您的 IP 正在启动您的抓取工具已被列入黑名单.

It seems that your IP from you are launching your scraper has been blacklisted.

您可能想阅读以下内容:https://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned

You might want to read the following: https://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned

此外,您可能需要调整设置关于scrapy输出的请求数:CONCURRENT_REQUESTSDOWNLOAD_DELAY

Also, you might want to tune the settings concerning the number of requests outputted by scrapy: CONCURRENT_REQUESTS, DOWNLOAD_DELAY, etc.

这篇关于Scrapy Twisted ConnectionLost 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆