从网站的交互式图形中对数据进行Web抓取 [英] Webscraping data from an interactive graph from a website
问题描述
我正在尝试从下面提到的网站访问图表中的数据 https://www.prisjakt.nu/produkt.php?pu=5183925
I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925
我能够从图表下方的表格中访问和提取数据.但是我无法从使用JavaScript动态调用的图形中获取数据?我知道在这里使用beautifulsoup api是不够的.我尝试在网页的控制台中四处查看图表的内容,但不成功.
I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful.
我还尝试查看视图源: https://www.prisjakt.nu/produkt.php?pu = 5183925 的调用方式.
I also tried to look into view-source:https://www.prisjakt.nu/produkt.php?pu=5183925 how this is being called.
<div class="graph" data-testid="graph" data-test="PriceHistoryGraph">
我正在尝试从网站上打印商品价格的历史记录.例如,类似于下面的代码段,它是我从查看源代码"中找到的json格式.
I am trying to print the history of the prices of an item from the website. For example something similar to a below snippet which is in the json format i found from "view source".
"nodes":[{"date":"2019-09-10","lowestPrice":13195},{"date":"2019-09-11","lowestPrice":12990},{"date":"2019-09-12","lowestPrice":12990},
我怀疑可以在以下位置找到上述数据
I am suspecting that the above data can be found at
<rect class = "vx-bar" ...... where data="[Object Object][Object Object][Object Object]..."
是一个数组列表,每个数组中都有两个元素.与上面的片段节点"相似的东西.不是吗?
is a list of arrays with two elements in each array. Something similar to to above snippet "nodes". Isn't it?
此刻我正在使用一个简单的代码来创建一个Biref想法,它将打印出包括下图和表格在内的整个布局.
A simple piece of code i am using at the moment for a biref idea which will print entire layout including the graph and table below.
my_url = 'https://www.prisjakt.nu/produkt.php?pu=5183925'
driver.get(my_url)
sleep(3)
page = requests.get(my_url, headers=headers)
soup = soup(page.content, 'html.parser')
data = soup.findAll(id="statistics")
print(data)
任何有关示例或解决方案的建议都会对我有所帮助.在此先感谢!
Any suggestions with an example or a solution would help me. Thanks in Advance!
推荐答案
是的,该图是动态构建的,但是您可以轻松地获取该数据.
You're right, the graph is being constructed dynamically, but you can easily grab that data.
方法如下:
import requests
response = requests.get('https://www.prisjakt.nu/_internal/graphql?release=2020-11-20T07:33:45Z|db08e4bc&version=6f2bf5&main=product&variables={"id":5183925,"offset":0,"section":"statistics","statisticsTime":"1970-01-02","marketCode":"se","personalizationExcludeCategories":[],"userActions":true,"badges":true,"media":true,"campaign":true,"relatedProducts":true,"campaignDeals":true,"priceHistory":true,"recommendations":true,"campaignId":2,"personalizationClientId":"","pulseEnvironmentId":"sdrn:schibsted:environment:undefined"}').json()
for node in response["data"]["product"]["statistics"]["nodes"]:
print(f"{node['date']} - {node['lowestPrice']}")
输出:
2019-09-10 - 13195
2019-09-11 - 12990
2019-09-12 - 12990
2019-09-13 - 12605
2019-09-14 - 12605
2019-09-15 - 12605
2019-09-16 - 12970
2019-09-17 - 12970
2019-09-18 - 12970
2019-09-19 - 12969
2019-09-20 - 12969
2019-09-21 - 12969
2019-09-22 - 12969
2019-09-23 - 9195
2019-09-24 - 12970
and so on...
这篇关于从网站的交互式图形中对数据进行Web抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!