Python + Scrapy + JSON + XPath:如何使用Scrapy刮取JSON数据 [英] Python + Scrapy + JSON + XPath : How to scrape JSON data with Scrapy

查看：394 发布时间：2019/11/26 18:59:11 python json xpath scrapy

本文介绍了Python + Scrapy + JSON + XPath:如何使用Scrapy刮取JSON数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道如何使用Scrapy来获取HTML数据点的XPATH.但是我必须在此站点上将此页面的所有URL(起始URL)都以JSON格式抓取:

I know how to fetch the XPATHs for HTML datapoints with Scrapy. But I have to scrape all the URLs(starting URLs), of this page on this site, which are written in JSON format:

https://highape.com/bangalore/all-events

查看源: https://highape.com/bangalore/all-events

我通常以这种格式写这个:

I usually write this in this format:

def parse(self, response):
      events = response.xpath('**What To Write Here?**').extract()

      for event in events:
          absolute_url = response.urljoin(event)
          yield Request(absolute_url, callback = self.parse_event)

请告诉我在这里写什么?"中应该写的内容.部分.

Please tell me what I should write in 'What To Write Here?' portion.

推荐答案

查看URL的页面源代码，然后复制第76-9045行，并在您的本地驱动器中另存为data.json，然后使用此代码...

View page source of the url then copy line 76 - 9045 and save as data.json in your local drive then use this code...

import json
from bs4 import BeautifulSoup
import requests
req = requests.get('https://highape.com/bangalore/all-events')
soup = BeautifulSoup(req.content, 'html.parser')
js = soup.find_all('script')[5].text
data = json.loads(js, strict=False)
for i in data:
    url = i['url']
    print(url)
    ##callback with scrapy

这篇关于Python + Scrapy + JSON + XPath:如何使用Scrapy刮取JSON数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python + Scrapy + JSON + XPath:如何使用Scrapy刮取JSON数据 [英] Python + Scrapy + JSON + XPath : How to scrape JSON data with Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python + Scrapy + JSON + XPath:如何使用Scrapy刮取JSON数据 [英] Python + Scrapy + JSON + XPath : How to scrape JSON data with Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭