包含PowerBI图的网站的Python抓取 [英] Python scraping of a site that contains PowerBI graphs

查看:74
本文介绍了包含PowerBI图的网站的Python抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在python中抓取以下URL: https://msdh.ms.gov/msdhsite/_static/14,21995,420,873.html

I am trying to scrape the following URL in python: https://msdh.ms.gov/msdhsite/_static/14,21995,420,873.html

我需要得到的是以下内容:

What I need to get is the following:

如果您在上述URL上打开Chrome开发者工具,然后转到网络"标签的XHR部分,然后在请求标头"中查找任何名称"列,例如:"querydata?synchronous = true""部分有一个推荐人" URL.我需要从我的python代码中获取该URL,以便可以解码其querystring并构建我的scraper.

If you open the Chrome developers tool while on the above URL and then go to the XHR section of the Network tab and look inside any "Name" column such as: "querydata?synchronous=true", in the "Request Headers" section there is a "Referer" URL. I need to obtain that URL from my python code so that I could decode its querystring and build my scraper.

这是导航到以上URL的方法:

This is how to navigate to the above URL:

发件人:

https://msdh.ms.gov/(单击 Covid-19概述链接)

然后来自:

https://msdh.ms.gov/msdhsite/_static/14,0,420.html (点击交互式地图和图表)

最后来自:

https://msdh.ms.gov/msdhsite/_static/14,0,420,873.html (点击交互式图表:COVID-19流行病学图表和趋势)

有人知道如何获取引荐来源网址吗?

Anyone has an idea how to get that Referer URL?

推荐答案

弄清楚如何从该站点获取信息很有趣:

It was fun to figure out how to get info out of this site:

import re
import json
import base64
import requests
from bs4 import BeautifulSoup


url = 'https://msdh.ms.gov/msdhsite/_static/14,21995,420,873.html'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')
html_data = requests.get(soup.iframe['src']).text

d = json.loads(base64.b64decode(soup.iframe['src'].split('=')[-1]).decode('utf-8'))

tenantId = d['t']
resourceKey = d['k']
resolvedClusterUri = re.search(r"var resolvedClusterUri = '(.*?)'", html_data)[1].replace('-redirect', '-api')
requestId = re.search(r"var requestId = '(.*?)'", html_data)[1]
activityId = re.search(r"var telemetrySessionId =  '(.*?)'", html_data)[1]

url = resolvedClusterUri + "/public/reports/" + resourceKey + "/modelsAndExploration?preferReadOnlySession=true"
query_url = resolvedClusterUri + "/public/reports/querydata?synchronous=true"
headers={'ActivityId': activityId, 'RequestId': requestId, 'X-PowerBI-ResourceKey': resourceKey}
data = requests.get(url, headers=headers).json()

for s in data['exploration']['sections']:
    if 'query' in s['visualContainers'][0]:

        payload = {
          "version": "1.0.0",
          "queries": [
            {
              "Query": json.loads(s['visualContainers'][0]['query']),
              "CacheKey": '',
              "QueryId": "",
              "ApplicationContext": {
                "DatasetId": data['models'][0]['dbName'],
                "Sources": [
                  {
                    "ReportId": data['exploration']['report']['objectId']
                  }
                ]
              }
            }
          ],
          "cancelQueries": [],
          "modelId": data['models'][0]['id']
        }

        section_data = requests.post(query_url, json=payload, headers=headers).json()

        print(s['displayName'])
        print(section_data['results'][0]['result']['data']['dsr']['DS'][0]['PH'])
        print('-' * 80)

打印:

Gender
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['Female', 12029]}, {'C': ['Male', 8469]}, {'C': ['Unknown', 143]}]}]
--------------------------------------------------------------------------------
Cases and Deaths by Age
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}], 'C': ['<18', 1480, 0]}, {'C': ['18-29', 3876, 5]}, {'C': ['30-39', 3272, 16]}, {'C': ['40-49', 3387, 40]}, {'C': ['50-59', 3106, 73]}, {'C': ['60-69', 2550, 199]}, {'C': ['70-79', 1555, 259]}, {'C': ['80-89', 993, 217]}, {'C': ['90+', 420, 129]}]}]
--------------------------------------------------------------------------------
Pediatric
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['<1', 95]}, {'C': ['1-5', 299]}, {'C': ['6-10', 340]}, {'C': ['11-17', 746]}]}]
--------------------------------------------------------------------------------
Hospitalized by Age Group
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['<18', 31]}, {'C': ['18-29', 114]}, {'C': ['30-39', 176]}, {'C': ['40-49', 328]}, {'C': ['50-59', 461]}, {'C': ['60-69', 635]}, {'C': ['70-79', 553]}, {'C': ['80-89', 332]}, {'C': ['90+', 137]}]}]
--------------------------------------------------------------------------------
Hospitalized
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['No', 13849]}, {'C': ['Yes', 2767]}, {'C': ['Unknown', 1236]}]}]
--------------------------------------------------------------------------------
Deaths by Race and Ethicity
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['Black (NH)', 463]}, {'C': ['White (NH)', 372]}, {'C': ['American Indian or Alaska Native (NH)', 35]}, {'C': ['Hispanic**', 15]}, {'C': ['Other (NH)', 1]}, {'C': ['Asian (NH)', 0]}]}]
--------------------------------------------------------------------------------
Underlying Condition
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 4}, {'N': 'M3', 'T': 4}, {'N': 'M4', 'T': 4}, {'N': 'M5', 'T': 4}], 'C': ['Hypertension', 311, 242, 21, 9, 1, 0]}, {'C': ['Cardiovascular Disease', 248, 197, 13, 6], 'R': 96}, {'C': ['Diabetes', 243, 129, 20, 3], 'R': 96}, {'C': ['Obesity', 168, 83, 9, 4, 0], 'R': 64}, {'C': ['Renal Disease', 129, 64, 13, 2], 'R': 96}, {'C': ['Lung Disease', 123, 108, 3, 1], 'R': 96}, {'C': ['Neurologic Conditions', 110, 143, 5, 2], 'R': 96}, {'C': ['Immunocompromised', 63, 49, 3, 1], 'R': 96}, {'C': ['Liver Disease', 14, 19, 2], 'R': 112}, {'C': ['None Noted', 2, 2, 0, 0, 1], 'R': 64}]}]
--------------------------------------------------------------------------------
Epi Curve
[{'DM0': [{'S': [{'N': 'G0', 'T': 7}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 3}], 'C': [1580515200000, 1], 'Ø': 4}, {'C': [1580688000000], 'R': 6}, {'C': [1581120000000], 'R': 6}, {'C': [1581379200000], 'R': 6}, {'C': [1581465600000], 'R': 6}, {'C': [1581638400000, 2], 'R': 4}, {'C': [1581811200000, 1], 'R': 4}, {'C': [1582156800000, '1.1428571428571428'], 'R': 2}, {'C': [1582243200000], 'R': 6}, {'C': [1582416000000], 'R': 6}, {'C': [1582675200000, 2, '1.2857142857142858']}, {'C': [1582848000000, 3, '1.5714285714285714']}, {'C': [1582934400000, '1.7142857142857142'], 'R': 2}, {'C': [1583020800000, 12, '3.2857142857142856']}, {'C': [1583107200000, 5, '3.8571428571428572']}, {'C': [1583280000000, '4.4285714285714288'], 'R': 2}, {'C': [1583366400000, 1], 'R': 4}, {'C': [1583452800000, 12, '5.8571428571428568']}, {'C': [1583539200000, 8, '6.5714285714285712']}, {'C': [1583625600000, 11, '7.7142857142857144']}, {'C': [1583712000000, 35, 11]}, {'C': [1583798400000, 25, '13.857142857142858']}, {'C': [1583884800000, 23, '16.428571428571427']}, {'C': [1583971200000, 36, '21.428571428571427']}, {'C': [1584057600000, '24.857142857142858'], 'R': 2}, {'C': [1584144000000, 40, '29.428571428571427']}, {'C': [1584230400000, 71, 38]}, {'C': [1584316800000, 94, '46.428571428571431']}, {'C': [1584403200000, 81, '54.428571428571431']}, {'C': [1584489600000, 112, '67.142857142857139']}, {'C': [1584576000000, 94, '75.428571428571431']}, {'C': [1584662400000, 123, '87.857142857142861']}, {'C': [1584748800000, 98, '96.142857142857139']}, {'C': [1584835200000, 92, '99.142857142857139']}, {'C': [1584921600000, 164, '109.14285714285714']}, {'C': [1585008000000, 137, '117.14285714285714']}, {'C': [1585094400000, 132, 120]}, {'C': [1585180800000, 119, '123.57142857142857']}, {'C': [1585267200000, 164, '129.42857142857142']}, {'C': [1585353600000, 101, '129.85714285714286']}, {'C': [1585440000000, 103, '131.42857142857142']}, {'C': [1585526400000, 162, '131.14285714285714']}, {'C': [1585612800000, 150, 133]}, {'C': [1585699200000, '135.57142857142858'], 'R': 2}, {'C': [1585785600000, 149, '139.85714285714286']}, {'C': [1585872000000, 156, '138.71428571428572']}, {'C': [1585958400000, 143, '144.71428571428572']}, {'C': [1586044800000, 127, '148.14285714285714']}, {'C': [1586131200000, 206, '154.42857142857142']}, {'C': [1586217600000, 135, '152.28571428571428']}, {'C': [1586304000000, 178, '156.28571428571428']}, {'C': [1586390400000, 183, '161.14285714285714']}, {'C': [1586476800000, 178, '164.28571428571428']}, {'C': [1586563200000, 146, '164.71428571428572']}, {'C': [1586649600000, 141, '166.71428571428572']}, {'C': [1586736000000, 218, '168.42857142857142']}, {'C': [1586822400000, 214, '179.71428571428572']}, {'C': [1586908800000, 236, 188]}, {'C': [1586995200000, 191, '189.14285714285714']}, {'C': [1587081600000, 215, '194.42857142857142']}, {'C': [1587168000000, 171, 198]}, {'C': [1587254400000, 181, '203.71428571428572']}, {'C': [1587340800000, 307, '216.42857142857142']}, {'C': [1587427200000, 259, '222.85714285714286']}, {'C': [1587513600000, 280, '229.14285714285714']}, {'C': [1587600000000, 235, '235.42857142857142']}, {'C': [1587686400000, 276, '244.14285714285714']}, {'C': [1587772800000, 193, '247.28571428571428']}, {'C': [1587859200000, 170, '245.71428571428572']}, {'C': [1587945600000, 317, '247.14285714285714']}, {'C': [1588032000000, 278, '249.85714285714286']}, {'C': [1588118400000, 298, '252.42857142857142']}, {'C': [1588204800000, 245, '253.85714285714286']}, {'C': [1588291200000, 282, '254.71428571428572']}, {'C': [1588377600000, 193], 'R': 4}, {'C': [1588464000000, 148, '251.57142857142858']}, {'C': [1588550400000, 310, '250.57142857142858']}, {'C': [1588636800000, 290, '252.28571428571428']}, {'C': [1588723200000, 365, '261.85714285714283']}, {'C': [1588809600000, 274, 266]}, {'C': [1588896000000, 273, '264.71428571428572']}, {'C': [1588982400000, 168, '261.14285714285717']}, {'C': [1589068800000, 194, '267.71428571428572']}, {'C': [1589155200000, 356, '274.28571428571428']}, {'C': [1589241600000, 308, '276.85714285714283']}, {'C': [1589328000000, 274, '263.85714285714283']}, {'C': [1589414400000, 267, '262.85714285714283']}, {'C': [1589500800000, 332, '271.28571428571428']}, {'C': [1589587200000, 224, '279.28571428571428']}, {'C': [1589673600000, 172, '276.14285714285717']}, {'C': [1589760000000, 366, '277.57142857142856']}, {'C': [1589846400000, 302, '276.71428571428572']}, {'C': [1589932800000, 333, '285.14285714285717']}, {'C': [1590019200000, 355, '297.71428571428572']}, {'C': [1590105600000, 311, '294.71428571428572']}, {'C': [1590192000000, 236, '296.42857142857144']}, {'C': [1590278400000, 250, '307.57142857142856']}, {'C': [1590364800000, 291, '296.85714285714283']}, {'C': [1590451200000, 384, '308.57142857142856']}, {'C': [1590537600000, 383, '315.71428571428572']}, {'C': [1590624000000, 404, '322.71428571428572']}, {'C': [1590710400000, 306, 322]}, {'C': [1590796800000, 201, 317]}, {'C': [1590883200000, 179, '306.85714285714283']}, {'C': [1590969600000, 290, '306.71428571428572']}, {'C': [1591056000000, 256, '288.42857142857144']}, {'C': [1591142400000, 289], 'Ø': 4}, {'C': [1591228800000, 218], 'R': 4}, {'C': [1591315200000, 260], 'R': 4}, {'C': [1591401600000, 169], 'R': 4}, {'C': [1591488000000, 120], 'R': 4}, {'C': [1591574400000, 227], 'R': 4}, {'C': [1591660800000, 222], 'R': 4}, {'C': [1591747200000, 234], 'R': 4}, {'C': [1591833600000, 263], 'R': 4}, {'C': [1591920000000, 251], 'R': 4}, {'C': [1592006400000, 204], 'R': 4}, {'C': [1592092800000, 3], 'R': 4}, {'C': [1592179200000, 1], 'R': 4}, {'C': [1592265600000, 0], 'R': 4}]}]
--------------------------------------------------------------------------------
Ethnicity/Race
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['Black (NH)', 9865]}, {'C': ['White (NH)', 5003]}, {'C': ['Hispanic**', 1134]}, {'C': ['American Indian or Alaska Native (NH)', 298]}, {'C': ['Other (NH)', 270]}, {'C': ['Asian (NH)', 58]}]}]
--------------------------------------------------------------------------------
Deaths Gender x Race
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 4}, {'N': 'M3', 'T': 4}, {'N': 'M4', 'T': 4}], 'C': ['Male', 247, 182, 8, 0, 23]}, {'C': ['Female', 229, 213, 4, 29], 'R': 16}]}]
--------------------------------------------------------------------------------
Gender by Ethnicity/Race
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 4}, {'N': 'M3', 'T': 4}, {'N': 'M4', 'T': 4}, {'N': 'M5', 'T': 4}], 'C': ['Female', 6191, 2776, 494, 177, 141, 28]}, {'C': ['Male', 3641, 2209, 637, 121, 127, 30]}]}]
--------------------------------------------------------------------------------
LTCF
[{'DM0': [{'S': [{'N': 'G0', 'T': 7}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 3}], 'C': [1584316800000, 0, 1], 'Ø': 8}, {'C': [1584576000000], 'R': 14}, {'C': [1584662400000], 'R': 14}, {'C': [1584748800000, 1, 2], 'R': 8}, {'C': [1585008000000, 0, 3], 'R': 8}, {'C': [1585094400000, 1], 'R': 10}, {'C': [1585180800000, 4], 'R': 10}, {'C': [1585267200000, 3, '2.2857142857142856'], 'R': 2}, {'C': [1585353600000, 1, 5, 3]}, {'C': [1585440000000, 0, 3, '3.2857142857142856']}, {'C': [1585526400000, 7, '3.8571428571428572'], 'R': 2}, {'C': [1585612800000, 2, 4, '4.2857142857142856']}, {'C': [1585699200000, 1, 12, 6]}, {'C': [1585785600000, 4, '6.1428571428571432'], 'R': 2}, {'C': [1585872000000, 0, 7, '6.7142857142857144']}, {'C': [1585958400000, 3, 8, '7.4285714285714288']}, {'C': [1586044800000, 2, 7, '8.2857142857142865']}, {'C': [1586131200000, 1, 5, '8.1428571428571423']}, {'C': [1586217600000, 2, 8, '8.7142857142857135']}, {'C': [1586304000000, 6, 3, '8.1428571428571423']}, {'C': [1586390400000, 1, 8, '8.7142857142857135']}, {'C': [1586476800000, 3, 3, '8.5714285714285712']}, {'C': [1586563200000, 2, 6, '8.1428571428571423']}, {'C': [1586649600000, 3, 3, '7.7142857142857144']}, {'C': [1586736000000, 2, 6, 8]}, {'C': [1586822400000, 3, 9, '8.2857142857142865']}, {'C': [1586908800000, 4, 8], 'R': 2}, {'C': [1586995200000, 9, 9, '9.2857142857142865']}, {'C': [1587081600000, 5, 7, '10.142857142857142']}, {'C': [1587168000000, 4, 4], 'R': 8}, {'C': [1587254400000, 6, '10.714285714285714'], 'R': 2}, {'C': [1587340800000, 8, 4, '11.285714285714286']}, {'C': [1587427200000, 6, 8, '11.571428571428571']}, {'C': [1587513600000, 0, 3, 11]}, {'C': [1587600000000, 8, 8, '10.714285714285714']}, {'C': [1587686400000, 6, 3, '10.285714285714286']}, {'C': [1587772800000, 3, 1, '9.7142857142857135']}, {'C': [1587859200000, 4, 3, '9.2857142857142865']}, {'C': [1587945600000, 12, 5, 10]}, {'C': [1588032000000, 7, 8, '10.142857142857142']}, {'C': [1588118400000, 8, 3, '11.285714285714286']}, {'C': [1588204800000, 9, '10.714285714285714'], 'R': 4}, {'C': [1588291200000, 7, '11.714285714285714'], 'R': 2}, {'C': [1588377600000, 11, 8, '13.857142857142858']}, {'C': [1588464000000, 9, 4, '14.714285714285714']}, {'C': [1588550400000, 3, 6, '13.571428571428571']}, {'C': [1588636800000, 13, 9, '14.571428571428571']}, {'C': [1588723200000, 11, 2, '14.857142857142858']}, {'C': [1588809600000, 8, 1, '14.428571428571429']}, {'C': [1588896000000, 3, 7, '13.571428571428571']}, {'C': [1588982400000, 8, 2, '12.285714285714286']}, {'C': [1589068800000, 1, '11.714285714285714'], 'R': 2}, {'C': [1589155200000, 6, 4, '11.857142857142858']}, {'C': [1589241600000, 9, 10, '11.428571428571429']}, {'C': [1589328000000, 11, 4, '11.714285714285714']}, {'C': [1589414400000, 12, 8, '13.285714285714286']}, {'C': [1589500800000, 13, 5, '14.428571428571429']}, {'C': [1589587200000, 9, 9, '15.571428571428571']}, {'C': [1589673600000, 14, 5, 17]}, {'C': [1589760000000, 5, 8, '17.428571428571427']}, {'C': [1589846400000, 12, 5, '17.142857142857142']}, {'C': [1589932800000, 8, 4, '16.714285714285715']}, {'C': [1590019200000, 7, 7, '15.857142857142858']}, {'C': [1590105600000, 6, '15.142857142857142'], 'R': 4}, {'C': [1590192000000, 8, 5, '14.428571428571429']}, {'C': [1590278400000, 11, 4, '13.857142857142858']}, {'C': [1590364800000, 8, 12, '14.857142857142858']}, {'C': [1590451200000, 10, 5, '14.571428571428571']}, {'C': [1590537600000, 8, 8, '15.142857142857142']}, {'C': [1590624000000, 7, 7], 'R': 8}, {'C': [1590710400000, 14, 9, '16.571428571428573']}, {'C': [1590796800000, 3, 4, '15.714285714285714']}, {'C': [1590883200000, 11, 2, '15.428571428571429']}, {'C': [1590969600000, 9, 7, '14.857142857142858']}, {'C': [1591056000000, 13, 4, '15.142857142857142']}, {'C': [1591142400000, 2, '13.714285714285714'], 'R': 4}, {'C': [1591228800000, 9, 6, '13.857142857142858']}, {'C': [1591315200000, 2, '11.714285714285714'], 'R': 4}, {'C': [1591401600000, 5, 2], 'R': 8}, {'C': [1591488000000, 6, 5, '11.428571428571429']}, {'C': [1591574400000, 5, '10.571428571428571'], 'R': 4}, {'C': [1591660800000, 4, '9.4285714285714288'], 'R': 2}, {'C': [1591747200000, 2], 'R': 4, 'Ø': 8}, {'C': [1591833600000, 1, 2], 'R': 8}, {'C': [1591920000000, 4, 6], 'R': 8}, {'C': [1592006400000, 5, 3], 'R': 8}, {'C': [1592092800000, 3, 5], 'R': 8}, {'C': [1592179200000, 2, 4], 'R': 8}, {'C': [1592265600000, 1, 0], 'R': 8}]}]
--------------------------------------------------------------------------------
Recovery
[{'DM0': [{'S': [{'N': 'M0', 'T': 3}], 'M0': 15323}]}]
--------------------------------------------------------------------------------

这篇关于包含PowerBI图的网站的Python抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆