如何解决刮擦的不支持的URL方案错误? [英] How do I fix scrapy Unsupported URL scheme error?
问题描述
我从命令python收集url,然后将其插入start_urls
I collect url from command python and then insert it into start_urls
from flask import Flask, jsonify, request
import scrapy
import subprocess
class ClassSpider(scrapy.Spider):
name = 'mySpider'
#start_urls = []
#pages = 0
news = []
def __init__(self, url, nbrPage):
self.pages = nbrPage
self.start_urls = []
self.start_urlsappend(url)
def parse(self):
...
def run(self):
subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.start_urls}', '-a', f'nbrPage={self.pages}'])
return self.news
app = Flask(__name__)
data = []
@app.route('/', methods=['POST'])
def getNews():
mySpiderClass = ClassSpider(request.json['url'], 2)
return jsonify({'data': mySpider.run()})
if __name__ == "__main__":
app.run(debug=True)
我收到此错误:不支持加注(不支持的URL方案%s:%s"%scrapy.exceptions.NotSupported:URL方案不受支持:该方案没有可用的处理程序
I got this error: raise not supported("unsupported url scheme %s: %s" % scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme
当我放一个 print('my urls list:'+ str(self.start_urls))
,它打印url列表,例如-> my urls list:['www.googole.com']
When I put a
print('my urls List: ' + str(self.start_urls))
, it prints a list of url like --> my urls List: ['www.googole.com']
任何帮助plz
推荐答案
我想发生这种情况是因为您先将 url
附加到 self.start_urls
,然后调用 ClassSpider
的 run
方法和列表 self.start_urls
,该方法又将列表追加到列表中,最终得到嵌套列表而不是列表字符串.
为了避免这种情况,您应该像这样更改您的 __ init __
方法:
I guess this happens because you first append url
to self.start_urls
and then you call ClassSpider
s run
method with your list self.start_urls
which in turn appends the list to a list and you end up with a nested list instead of a list of strings.
To avoid this you should maybe change your __init__
method like this:
def __init__(self, url, nbrPage):
self.pages = nbrPage
self.url = url
self.start_urls = []
self.start_urls.append(url)
然后在 run
中传递 self.url
而不是 self.start_urls
:
def run(self):
subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.url}', '-a', f'nbrPage={self.pages}'])
return self.news
这篇关于如何解决刮擦的不支持的URL方案错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!