请求 URL 中缺少方案 [英] Missing scheme in request URL
问题描述
我被这个bug卡了一段时间了,报错信息如下:
I've been stuck on this bug for a while, the following error message is as follows:
File "C:\Python27\lib\site-packages\scrapy-0.20.2-py2.7.egg\scrapy\http\request\__init__.py", line 61, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
exceptions.ValueError: Missing scheme in request url: h
Scrapy 代码:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.http import Request
from spyder.items import SypderItem
import sys
import MySQLdb
import hashlib
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
# _*_ coding: utf-8 _*_
class some_Spyder(CrawlSpider):
name = "spyder"
def __init__(self, *a, **kw):
# catch the spider stopping
# dispatcher.connect(self.spider_closed, signals.spider_closed)
# dispatcher.connect(self.on_engine_stopped, signals.engine_stopped)
self.allowed_domains = "domainname.com"
self.start_urls = "http://www.domainname.com/"
self.xpaths = '''//td[@class="CatBg" and @width="25%"
and @valign="top" and @align="center"]
/table[@cellspacing="0"]//tr/td/a/@href'''
self.rules = (
Rule(SgmlLinkExtractor(restrict_xpaths=(self.xpaths))),
Rule(SgmlLinkExtractor(allow=('cart.php?')), callback='parse_items'),
)
super(spyder, self).__init__(*a, **kw)
def parse_items(self, response):
sel = Selector(response)
items = []
listings = sel.xpath('//*[@id="tabContent"]/table/tr')
item = IgeItem()
item["header"] = sel.xpath('//td[@valign="center"]/h1/text()')
items.append(item)
return items
我很确定这与我要求scrapy在LinkExtractor中遵循的URL有关.在 shell 中提取它们时,它们看起来像这样:
I'm pretty sure it's something to do with the URL I'm asking scrapy to follow in the LinkExtractor. When extracting them in shell they looking something like this:
data=u'cart.php?target=category&category_id=826'
与从工作蜘蛛中提取的另一个 URL 相比:
Compared to another URL extracted from a working spider:
data=u'/path/someotherpath/category.php?query=someval'
我看了几个关于 Stack Overflow 的问题,比如用scrapy下载图片 但从阅读它我想我可能有一个稍微不同的问题.
I've had a look at a few questions on Stack Overflow, such as Downloading pictures with scrapy but from reading it I think I may have a slightly different problem.
我也看了这个——http://static.scrapy.org/coverage-report/scrapy_http_request___init__.html
这解释了如果 self.URLs 缺少:",则会引发错误,从查看我定义的 start_urls 来看,我不太明白为什么会显示此错误,因为该方案已明确定义.
Which explains that the error is thrown up if self.URLs is missing a ":", from looking at the start_urls I've defined I can't quite see why this error would show since the scheme is clearly defined.
推荐答案
将 start_urls
更改为:
self.start_urls = ["http://www.bankofwow.com/"]
这篇关于请求 URL 中缺少方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!