无法使用scrapy框架307重定向错误抓取myntra API数据 [英] unable to scrape myntra API data using scrapy framework 307 redirect error
本文介绍了无法使用scrapy框架307重定向错误抓取myntra API数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
import scrapy
class MyntraSpider(scrapy.Spider):
custom_settings = {
'HTTPCACHE_ENABLED': False,
'dont_redirect': True,
#'handle_httpstatus_list' : [302,307],
#'CRAWLERA_ENABLED': False,
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
}
name = "heytest"
allowed_domains = ["www.myntra.com"]
start_urls = ["https://www.myntra.com/web/v2/search/data/duke"]
def parse(self, response):
self.logger.debug('Parsed jabong.com')
解析的 jabong.com"没有被记录.实际上,回调方法(解析)没有被调用.请回复.
"Parsed jabong.com" is not getting logged. Actually, callback method(parse) is not getting called. Kindly revert.
请从 Scarping hub 中查找错误日志:
Please find Error logs from scarping hub:
另见邮递员截图
推荐答案
我运行这段代码(只运行了几次),获取数据没有问题.
I run this code (only few times) and I have no problem to get data.
它看起来与您的代码相似,所以我不知道您为什么遇到问题.
It looks similar to your code so I don't know why you have problem.
也许他们出于某种原因阻止了您.
Maybe they block you for some reason.
#!/usr/bin/env python3
import scrapy
import json
class MySpider(scrapy.Spider):
name = 'myspider'
allowed_domains = ['www.myntra.com']
start_urls = ['https://www.myntra.com/web/v2/search/data/duke']
#def start_requests(self):
# for tag in self.tags:
# for page in range(self.pages):
# url = self.url_template.format(tag, page)
# yield scrapy.Request(url)
def parse(self, response):
print('url:', response.url)
#print(response.body)
data = json.loads(response.body)
print('data.keys():', data.keys())
print('meta:', data['meta'])
print("data['data']:", data['data'].keys())
# download files
#for href in response.css('img::attr(href)').extract():
# url = response.urljoin(src)
# yield {'file_urls': [url]}
# download images and convert to JPG
#for src in response.css('img::attr(src)').extract():
# url = response.urljoin(src)
# yield {'image_urls': [url]}
# --- it runs without project and saves in `output.csv` ---
from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
#'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
# save in CSV or JSON
'FEED_FORMAT': 'csv', # 'json
'FEED_URI': 'output.csv', # 'output.json
# download files to `FILES_STORE/full`
# it needs `yield {'file_urls': [url]}` in `parse()`
#'ITEM_PIPELINES': {'scrapy.pipelines.files.FilesPipeline': 1},
#'FILES_STORE': '/path/to/valid/dir',
# download images and convert to JPG
# it needs `yield {'image_urls': [url]}` in `parse()`
#'ITEM_PIPELINES': {'scrapy.pipelines.files.ImagesPipeline': 1},
#'IMAGES_STORE': '/path/to/valid/dir',
#'HTTPCACHE_ENABLED': False,
#'dont_redirect': True,
#'handle_httpstatus_list' : [302,307],
#'CRAWLERA_ENABLED': False,
})
c.crawl(MySpider)
c.start()
这篇关于无法使用scrapy框架307重定向错误抓取myntra API数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文