请求 URL 中缺少方案 [英] Missing scheme in request URL

查看:51
本文介绍了请求 URL 中缺少方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被这个bug卡了一段时间了,报错信息如下:

I've been stuck on this bug for a while, the following error message is as follows:

File "C:\Python27\lib\site-packages\scrapy-0.20.2-py2.7.egg\scrapy\http\request\__init__.py", line 61, in _set_url
            raise ValueError('Missing scheme in request url: %s' % self._url)
            exceptions.ValueError: Missing scheme in request url: h

Scrapy 代码:

    from scrapy.contrib.spiders import CrawlSpider, Rule
    from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
    from scrapy.selector import Selector
    from scrapy.http import Request
    from spyder.items import SypderItem

    import sys
    import MySQLdb
    import hashlib
    from scrapy import signals
    from scrapy.xlib.pydispatch import dispatcher

    # _*_ coding: utf-8 _*_

    class some_Spyder(CrawlSpider):
        name = "spyder"

        def __init__(self, *a, **kw):
            # catch the spider stopping
            # dispatcher.connect(self.spider_closed, signals.spider_closed)
            # dispatcher.connect(self.on_engine_stopped, signals.engine_stopped)

            self.allowed_domains = "domainname.com"
            self.start_urls = "http://www.domainname.com/"
            self.xpaths = '''//td[@class="CatBg" and @width="25%" 
                          and @valign="top" and @align="center"]
                          /table[@cellspacing="0"]//tr/td/a/@href'''

            self.rules = (
                Rule(SgmlLinkExtractor(restrict_xpaths=(self.xpaths))),
                Rule(SgmlLinkExtractor(allow=('cart.php?')), callback='parse_items'),
                )

            super(spyder, self).__init__(*a, **kw)

        def parse_items(self, response):
            sel = Selector(response)
            items = []
            listings = sel.xpath('//*[@id="tabContent"]/table/tr')

            item = IgeItem()
            item["header"] = sel.xpath('//td[@valign="center"]/h1/text()')

            items.append(item)
            return items

我很确定这与我要求scrapy在LinkExtractor中遵循的URL有关.在 shell 中提取它们时,它们看起来像这样:

I'm pretty sure it's something to do with the URL I'm asking scrapy to follow in the LinkExtractor. When extracting them in shell they looking something like this:

data=u'cart.php?target=category&category_id=826'

与从工作蜘蛛中提取的另一个 URL 相比:

Compared to another URL extracted from a working spider:

data=u'/path/someotherpath/category.php?query=someval'

我看了几个关于 Stack Overflow 的问题,比如用scrapy下载图片 但从阅读它我想我可能有一个稍微不同的问题.

I've had a look at a few questions on Stack Overflow, such as Downloading pictures with scrapy but from reading it I think I may have a slightly different problem.

我也看了这个——http://static.scrapy.org/coverage-report/scrapy_http_request___init__.html

这解释了如果 self.URLs 缺少:",则会引发错误,从查看我定义的 start_urls 来看,我不太明白为什么会显示此错误,因为该方案已明确定义.

Which explains that the error is thrown up if self.URLs is missing a ":", from looking at the start_urls I've defined I can't quite see why this error would show since the scheme is clearly defined.

推荐答案

start_urls 更改为:

self.start_urls = ["http://www.bankofwow.com/"]

这篇关于请求 URL 中缺少方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆