抓取数据但网址未更改 [英] Crawling data but the url doesn't change

查看:31
本文介绍了抓取数据但网址未更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用python从这个网页抓取数据:

I want to crawl data using python from this webpage:

https://www.discountoptiondata.com/freedata/

保持到期日期和交易品种的相同值,但迭代开始日期的所有值.问题是所有组合的 URL 都保持不变,因此我无法获得要抓取的 URL 列表.

by keep same value for expiration date and symbol but iterating over all values of the start date. The problem is that the URL stays same for all combinations and hence I cannot get a list of the URLs I want to crawl.

有人对我如何做到这一点有想法吗?

anybody has ideas about how I can do that?

推荐答案

您尝试解析的网站是动态的,这意味着当您在浏览器中下载它时,它会运行一些代码.在您的情况下,代码设置为在单击获取 OptionData"按钮时获取数据.

The website you are trying to parse is dynamic, which means it runs some code when you download it in your browser. In your case, the code is set to fetch the data when the "Get OptionData" button is clicked.

您实际上可以在浏览器开发者工具的网络选项卡中看到浏览器获取数据.F12 → 网络→(刷新页面)→ 填写表格并点击获取选项数据".它将在网络选项卡列表中显示为 XHR 请求.

You can actually see the browser fetch the data in the Network tab of your browsers Developer Tools. F12 → Network → (Refresh the page) → Fill out the form and Click "Get OptionData". It will show up as a XHR request in the Network Tab list.

数据获取的响应看起来有点像这样

The response of the data fetch will look a bit like this

{
    "AskPrice": "5.7",
    "AskSize": "",
    "BidPrice": "0.85",
    "ExpirationDate": "2019-06-21",
    "LastPrice": "4.4",
    "StrikePrice": "1000",
    "Symbol": "SPX"
}

从数据获取返回的数据被编码为 JSON,对我们来说幸运的是,它很容易在 Python 中解析.您可以通过调查 Network 选项卡中的 XHR 请求来获取上述 JSON 代码,这是我的 URL

The data returned from the data fetch is encoded as JSON, and lucky for us, its very easy to parse in Python. You can get the above JSON code by investigating the XHR request in the Network tab, this was the URL for me

https://www.discountoptiondata.com/freedata/getoptiondatajson?symbol=spx&datadate=2018-06-01&expirationDate=2018-06-15

我不熟悉scrapy,但对于基于JSON 的解析,我会推荐'requests' 模块.这是一个示例程序,它将获取网页上显示的数据

I am unfamiliar with scrapy, but for JSON based parsing, I would recommend the 'requests' module. Here is an example program that will fetch the data shown on the webpage

import requests

ROOT_URL = "https://www.discountoptiondata.com/freedata/getoptiondatajson"


def fetch_option_data(symbol, datadate, expiration_date):
    response = requests.get(ROOT_URL, params={"symbol": symbol, "datadate": datadate, "expirationDate": expiration_date})
    return response.json()


data = fetch_option_data('spx', '2018-06-01', '2018-06-15')

for item in data:
    print("AskPrice:", item['AskPrice'], "Last Price:", item["LastPrice"])

这篇关于抓取数据但网址未更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆