python - Scrapy的使用,如何请求新的URL,并回调指定的函数?
本文介绍了python - Scrapy的使用,如何请求新的URL,并回调指定的函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
问 题
关于Python3下Scrapy的使用问题
import re
import scrapy
from bs4 import BeautifulSoup
from scrapy.http import Request
from ..items import ZhibobaItem
import json
import lxml.html
import requests
import json
class Myspider(scrapy.Spider):
name = 'zhiboba'
allowed_domains = ['zhibo8.cc']
json_url = 'https://bifen4pc.qiumibao.com/json/list.htm?85591'
bash_url = 'https://www.zhibo8.cc/'
def start_requests(self):
yield Request(self.bash_url, self.parse_index)
def parse_index(self, response):
print("enter the parse_index")
print(self.bash_url)
divs = BeautifulSoup(response.text, 'lxml').find_all(label=re.compile("足球"))
item = ZhibobaItem()
for single_div in divs:
item['label'] = single_div.get('label')
item['sdate'] = single_div.get('data-time')
item['linkurl'] = self.bash_url + single_div.find('a')['href']
home_team = single_div.get_text().split()[2]
item['home_team'] = home_team
visit_team = single_div.get_text().split()[4]
item['visit_team'] = visit_team
print("quit the parse_index")
print(self.json_url)
yield Request(self.json_url, callback=self.get_score, meta={'home_team': home_team,
'visit_team': visit_team
})
def get_score(self, response):
print("enter the get_score")
json_url = self.json_url
wbdata = response.get(json_url).text
data = json.loads(wbdata)
news = data['list']
print(wbdata)
print("quit the get_score")
当我执行上述代码时,无法成功的调用json_url以及相应的响应函数get_score,哪里不对?
解决方案
试着修改allow_domains = []
。
这篇关于python - Scrapy的使用,如何请求新的URL,并回调指定的函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文