从网站的交互式图形中对数据进行Web抓取 [英] Webscraping data from an interactive graph from a website

查看:48
本文介绍了从网站的交互式图形中对数据进行Web抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从下面提到的网站访问图表中的数据 https://www.prisjakt.nu/produkt.php?pu=5183925

I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925

我能够从图表下方的表格中访问和提取数据.但是我无法从使用JavaScript动态调用的图形中获取数据?我知道在这里使用beautifulsoup api是不够的.我尝试在网页的控制台中四处查看图表的内容,但不成功.

I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful.

我还尝试查看视图源: https://www.prisjakt.nu​​/produkt.php?pu = 5183925 的调用方式.

I also tried to look into view-source:https://www.prisjakt.nu/produkt.php?pu=5183925 how this is being called.

<div class="graph" data-testid="graph" data-test="PriceHistoryGraph">

我正在尝试从网站上打印商品价格的历史记录.例如,类似于下面的代码段,它是我从查看源代码"中找到的json格式.

I am trying to print the history of the prices of an item from the website. For example something similar to a below snippet which is in the json format i found from "view source".

"nodes":[{"date":"2019-09-10","lowestPrice":13195},{"date":"2019-09-11","lowestPrice":12990},{"date":"2019-09-12","lowestPrice":12990},

我怀疑可以在以下位置找到上述数据

I am suspecting that the above data can be found at

<rect class = "vx-bar" ...... where data="[Object Object][Object Object][Object Object]..." 

是一个数组列表,每个数组中都有两个元素.与上面的片段节点"相似的东西.不是吗?

is a list of arrays with two elements in each array. Something similar to to above snippet "nodes". Isn't it?

此刻我正在使用一个简单的代码来创建一个Biref想法,它将打印出包括下图和表格在内的整个布局.

A simple piece of code i am using at the moment for a biref idea which will print entire layout including the graph and table below.

my_url = 'https://www.prisjakt.nu/produkt.php?pu=5183925'
driver.get(my_url)
sleep(3)

page = requests.get(my_url, headers=headers)
soup = soup(page.content, 'html.parser')
data = soup.findAll(id="statistics")
print(data)

任何有关示例或解决方案的建议都会对我有所帮助.在此先感谢!

Any suggestions with an example or a solution would help me. Thanks in Advance!

推荐答案

是的,该图是动态构建的,但是您可以轻松地获取该数据.

You're right, the graph is being constructed dynamically, but you can easily grab that data.

方法如下:

import requests

response = requests.get('https://www.prisjakt.nu/_internal/graphql?release=2020-11-20T07:33:45Z|db08e4bc&version=6f2bf5&main=product&variables={"id":5183925,"offset":0,"section":"statistics","statisticsTime":"1970-01-02","marketCode":"se","personalizationExcludeCategories":[],"userActions":true,"badges":true,"media":true,"campaign":true,"relatedProducts":true,"campaignDeals":true,"priceHistory":true,"recommendations":true,"campaignId":2,"personalizationClientId":"","pulseEnvironmentId":"sdrn:schibsted:environment:undefined"}').json()


for node in response["data"]["product"]["statistics"]["nodes"]:
    print(f"{node['date']} - {node['lowestPrice']}")

输出:

2019-09-10 - 13195
2019-09-11 - 12990
2019-09-12 - 12990
2019-09-13 - 12605
2019-09-14 - 12605
2019-09-15 - 12605
2019-09-16 - 12970
2019-09-17 - 12970
2019-09-18 - 12970
2019-09-19 - 12969
2019-09-20 - 12969
2019-09-21 - 12969
2019-09-22 - 12969
2019-09-23 - 9195
2019-09-24 - 12970
and so on...

这篇关于从网站的交互式图形中对数据进行Web抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆