Scrapy 给出 URLError: <urlopen error timed out> [英] Scrapy gives URLError: <urlopen error timed out>

查看:56
本文介绍了Scrapy 给出 URLError: <urlopen error timed out>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个scrapy程序,我试图开始,但我无法让我的代码执行它总是出现以下错误.

So I have a scrapy program I am trying to get off the ground but I can't get my code to execute it always comes out with the error below.

我仍然可以使用 scrapy shell 命令访问该站点,因此我知道 Url 和所有工作.

I can still visit the site using the scrapy shell command so I know the Url's and stuff all work.

这是我的代码

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from Malscraper.items import MalItem

class MalSpider(CrawlSpider):
  name = 'Mal'
  allowed_domains = ['www.website.net']
  start_urls = ['http://www.website.net/stuff.php?']
  rules = [
    Rule(LinkExtractor(
        allow=['//*[@id="content"]/div[2]/div[2]/div/span/a[1]']),
        callback='parse_item',
        follow=True)
  ]

  def parse_item(self, response):
    mal_list = response.xpath('//*[@id="content"]/div[2]/table/tr/td[2]/')

    for mal in mal_list:
      item = MalItem()
      item['name'] = mal.xpath('a[1]/strong/text()').extract_first()
      item['link'] = mal.xpath('a[1]/@href').extract_first()

      yield item

这是跟踪.

Traceback (most recent call last):
  File "C:\Users\2015\Anaconda\lib\site-packages\boto\utils.py", line 210, in retry_url
    r = opener.open(req, timeout=timeout)
  File "C:\Users\2015\Anaconda\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Users\2015\Anaconda\lib\urllib2.py", line 449, in _open
    '_open', req)
  File "C:\Users\2015\Anaconda\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Users\2015\Anaconda\lib\urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "C:\Users\2015\Anaconda\lib\urllib2.py", line 1197, in do_open
    raise URLError(err)
URLError: <urlopen error timed out>

编辑 2:

所以使用scrapy shell 命令 我可以操纵我的响应,但我只是注意到在访问该站点时再次出现相同的错误

So with the scrapy shell command I am able to manipulate my responses but I just noticed that the same exact error comes up again when visiting the site

编辑 3:

我现在发现错误出现在我使用 shell 命令 的每个网站上,但我仍然能够操纵响应.

I am now finding that the error shows up on EVERY website I use the shell command with, but I am able to manipulate the response still.

那么如何验证我在运行 crawl 命令 时至少收到了来自 Scrapy 的响应?现在我不知道是我的代码导致我的日志变空还是错误?

So how do I verify I am atleast receiving a response from Scrapy when running the crawl command? Now I don't know if its my code that is the reason my logs turns up empty or the error ?

这是我的settings.py

Here is my settings.py

BOT_NAME = 'Malscraper'

SPIDER_MODULES = ['Malscraper.spiders']
NEWSPIDER_MODULE = 'Malscraper.spiders'
FEED_URI = 'logs/%(name)s/%(time)s.csv'
FEED_FORMAT = 'csv'

推荐答案

有一个针对此问题的开放式 scrapy 问题:https://github.com/scrapy/scrapy/issues/1054

There's an open scrapy issue for this problem: https://github.com/scrapy/scrapy/issues/1054

虽然在其他平台上似乎只是一个警告.

Although it seems to be just a warning on other platforms.

你可以通过添加到你的scrapy设置来禁用S3DownloadHandler(导致这个错误):

You can disable the S3DownloadHandler (that is causing this error) by adding to your scrapy settings:

DOWNLOAD_HANDLERS = {
  's3': None,
}

这篇关于Scrapy 给出 URLError: &lt;urlopen error timed out&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆