如何抓取仅在单击地图后在绘图中显示数据的Tableau仪表板? [英] How to scrape a Tableau dashboard in which data is only displayed in a plot after clicking in a map?

查看:48
本文介绍了如何抓取仅在单击地图后在绘图中显示数据的Tableau仪表板?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此Tableau公用仪表板中抓取数据.最不确定的是在时间序列中绘制的数据.如果我在地图上单击特定状态,则时间序列将更改为该特定状态.遵循

I am trying to scrape data from this public Tableau dashboard. The ineterest is in the time series plotted data. If i click in a spcific state in the map, the time series changes to that specific state. Following this and this posts I got the results for the time series aggregated at the country-level (with the code provided below). But my interest is in a state-level data.

import requests
from bs4 import BeautifulSoup
import json
import re

# get the second tableau link
r = requests.get(
    f"https://public.tableau.com/views/MKTScoredeisolamentosocial/VisoGeral",
    params= {
        ":showVizHome":"no"
    }
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'https://public.tableau.com{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],

})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))


print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])

我研究了Tableau类别,发现可以在URL中插入一些参数以获得理想的结果,但是我找不到这些参数.我注意到我想要的数据存储在名为"time_line_BR"的工作表中,其中BR代表巴西.但我想为各州更改此设置,例如圣保罗(SP).我还注意到了tableauData中的一些参数,例如"current_view_id",我怀疑这些参数可能与时间序列中正在加载的数据有关.

I researched about Tableau categories and found out that some parameters can be inserted in the URL to get desirible results, but I was unable to find such parameters. I noticed that the data I want is stored in a worksheet named "time_line_BR", where BR stands for Brazil. But I would like to change this for the states, e.g. São Paulo (SP). I also noted some parameters in tableauData, like "current_view_id", that I suspect can be related to the data being loaded in the time series.

是否可以发布一个请求,其中所拉取的数据与我在手动选择特定状态时在图中看到的数据相同?

Is is possible to post a request where the data pulled is the same as the one I see in the plots when I manually select a specific state?

推荐答案

编辑

我制作了用于抓取Tableau仪表板的python库.实现起来更简单:

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/MKTScoredeisolamentosocial/VisoGeral"

ts = TS()
ts.loads(url)
dashboard = ts.getDashboard()

for t in dashboard.worksheets:
    #show worksheet name
    print(f"WORKSHEET NAME : {t.name}")
    #show dataframe for this worksheet
    print(t.data)

在repl.it上运行此

当您点击地图时,它会触发:

When you click on the map, it triggers a call on :

POST https://public.tableau.com/{vizql_root}/sessions/{session_id}/commands/tabdoc/select

具有一些如下所示的表单数据:

with some form data like the following :

worksheet: map_state_mobile
dashboard: Visão Geral
selection: {"objectIds":[17],"selectionType":"tuples"}
selectOptions: select-options-simple

它具有状态索引(此处为17)和工作表名称.我注意到,单击状态时,工作表名称为 map_state_mobile map_state(2).

It has the state index (here 17) and the worksheet name. I've noticed that the sheet name is either map_state_mobile or map_state (2) when you click a state.

因此,有必要:

  • 获取州名列表,为州选择正确的索引
  • 调用上面的API来选择状态并提取数据

状态是按字母顺序(反向)排序的,因此,如果您可以对它们进行硬编码并按如下方式对其进行排序,则不必使用以下方法:

The state are sorted alphabetically (reversed) so the method below is not necessary if you are ok with hardcoding them and sort them like this :

['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']

在其他情况下,我们不想对它们进行编码(用于其他tableau用例),请执行以下方法:

In other case when we don't want to harcode them (for other tableau usecase), execute the method below :

提取状态名称列表并不容易,因为数据显示如下:

Extracting the state name list is not straightforward since the data is presented as following :

{
     "secondaryInfo": {
         "presModelMap": {
            "dataDictionary": {
                "presModelHolder": {
                    "genDataDictionaryPresModel": {
                        "dataSegments": {
                            "0": {
                                "dataColumns": []
                            }
                        }
                    }
                }
            },
             "vizData": {
                     "presModelHolder": {
                         "genPresModelMapPresModel": {
                             "presModelMap": {
                                 "map_state (2)": {},
                                 "map_state_mobile": {},
                                 "time_line_BR": {},
                                 "time_line_BR_mobile": {},
                                 "total de casos": {},
                                 "total de mortes": {}
                             }
                         }
                     }
             }
         }
     }
}

我的方法是进入"vizData"并放入 presModelMap 内部的工作表中,该工作表具有以下结构:

My method is to get into "vizData" and into a worksheet inside presModelMap which has the following structure :

"presModelHolder": {
    "genVizDataPresModel": {
        "vizColumns": [],
        "paneColumnsData": {
            "vizDataColumns": [],
            "paneColumnsList": []
        }
    }
}

vizDataColumns 具有属性为 localBaseColumnName 的对象的集合.使用值为 fieldRole 作为 measure 的值找到 [state_name] localBaseColumnName :

vizDataColumns has a collection of object with property localBaseColumnName. Find the localBaseColumnName with value [state_name] with fieldRole as measure :

{
    "fn": "[federated.124ags61tmhyti14im1010h1elsu].[attr:state_name:nk]",
    "fnDisagg": "",
    "localBaseColumnName": "[state_name]", <============================= MATCH THIS
    "baseColumnName": "[federated.124ags61tmhyti14im1010h1elsu].[state_name]",
    "fieldCaption": "ATTR(State Name)",
    "formatStrings": [],
    "datasourceCaption": "federated.124ags61tmhyti14im1010h1elsu",
    "dataType": "cstring",
    "aggregation": "attr",
    "stringCollation": {
        "name": "LEN_RUS_S2",
        "charsetId": 0
    },
    "fieldRole": "measure", <=========================================== MATCH THIS
    "isAutoSelect": true,
    "paneIndices": [
        0  <=========================================== EXTRACT THIS
    ],
    "columnIndices": [
        7  <=========================================== EXTRACT THIS
    ]
} 

paneIndices paneColumnsList 数组中的索引匹配.并且 columnIndices vizPaneColumns 数组中的索引匹配. vizPaneColumns 数组位于 paneColumnsList 数组

paneIndices match the index in the paneColumnsList array. And columnIndices match the index in the vizPaneColumns array. vizPaneColumns array is located just in the item selected in paneColumnsList array

从那里您可以像这样搜索索引:

From there you get the index to search like this :

[222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248]

dataDictionary 对象中,获取dataValues(就像您在问题中所提取的一样),并从上述范围中提取状态名称

In the dataDictionary object, get the dataValues (like you've extracted in your question) and extract the state name from the range above

然后您将获得状态列表:

Then you get the state list :

['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']

呼叫选择端点

您只需要工作表名称和字段索引(上面列表中的状态索引)

Call the select endpoint

You just need the worksheet name and the index of the field (state index in the list above)

r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
    data = {
    "worksheet": "map_state (2)",
    "dashboard": "Visão Geral",
    "selection": json.dumps({
        "objectIds":[int(selected_index)],
        "selectionType":"tuples"
    }),
    "selectOptions": "select-options-simple"
})

下面的代码提取表格数据,使用上述方法提取状态名称(如果您希望对列表进行硬编码,则无需这样做),提示用户输入状态索引,调用选择端点并提取该状态的数据:

The code below extract the tableau data, extract the state name with the method above (not necessary if you prefer to hardcode the list), prompt user to enter state index, call the select endpoint and extract the data for this state :

import requests
from bs4 import BeautifulSoup
import json
import re

data_host = "https://public.tableau.com"

# get the second tableau link
r = requests.get(
    f"{data_host}/views/MKTScoredeisolamentosocial/VisoGeral",
    params= {
        ":showVizHome":"no"
    }
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

stateIndexInfo = [ 
    (t["fieldRole"], {
        "paneIndices": t["paneIndices"][0], 
        "columnIndices": t["columnIndices"][0], 
        "dataType": t["dataType"]
    }) 
    for t in data["secondaryInfo"]["presModelMap"]["vizData"]["presModelHolder"]["genPresModelMapPresModel"]["presModelMap"]["map_state (2)"]["presModelHolder"]["genVizDataPresModel"]["paneColumnsData"]["vizDataColumns"]
    if t.get("localBaseColumnName") and t["localBaseColumnName"] == "[state_name]"
]

stateNameIndexInfo = [t[1] for t in stateIndexInfo if t[0] == 'dimension'][0]

panelColumnList = data["secondaryInfo"]["presModelMap"]["vizData"]["presModelHolder"]["genPresModelMapPresModel"]["presModelMap"]["map_state (2)"]["presModelHolder"]["genVizDataPresModel"]["paneColumnsData"]["paneColumnsList"]
stateNameIndices = panelColumnList[stateNameIndexInfo["paneIndices"]]["vizPaneColumns"][stateNameIndexInfo["columnIndices"]]["valueIndices"]

# print [222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248]
#print(stateNameIndices)

dataValues = [
    t
    for t in data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"]
    if t["dataType"] == stateNameIndexInfo["dataType"]
][0]["dataValues"]

stateNames = [dataValues[t] for t in stateNameIndices]

# print ['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']
#print(stateNames)

for idx, val in enumerate(stateNames):
    print(f"{val} - {idx+1}")

selected_index = input("Please select a state by indices : ")
print(f"selected : {stateNames[int(selected_index)-1]}")

r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
    data = {
    "worksheet": "map_state (2)",
    "dashboard": "Visão Geral",
    "selection": json.dumps({
        "objectIds":[int(selected_index)],
        "selectionType":"tuples"
    }),
    "selectOptions": "select-options-simple"
})

dataSegments = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
print(dataSegments[max([*dataSegments])]["dataColumns"])

在repl.it上尝试一下

对状态名称列表进行硬编码的代码更为简单:

The code with hardcoding of the state name list is more straightforward :

import requests
from bs4 import BeautifulSoup
import json

data_host = "https://public.tableau.com"

r = requests.get(
    f"{data_host}/views/MKTScoredeisolamentosocial/VisoGeral",
    params= {
        ":showVizHome":"no"
    }
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
stateNames = ['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']

for idx, val in enumerate(stateNames):
    print(f"{val} - {idx+1}")

selected_index = input("Please select a state by indices : ")
print(f"selected : {stateNames[int(selected_index)-1]}")

r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
    data = {
    "worksheet": "map_state (2)",
    "dashboard": "Visão Geral",
    "selection": json.dumps({
        "objectIds":[int(selected_index)],
        "selectionType":"tuples"
    }),
    "selectOptions": "select-options-simple"
})

dataSegments = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
print(dataSegments[max([*dataSegments])]["dataColumns"])

在repl.it上尝试一下

请注意,在这种情况下,即使我们不关心第一个调用的输出(/bootstrapSession/sessions/{tableauData ["sessionid"]] ).需要验证session_id并随后调用select调用(否则,select不返回任何内容)

Note that, in this case, even if we don't care about the output of the first call (/bootstrapSession/sessions/{tableauData["sessionid"]}). It's needed to validate the session_id and call the select call afterwards (otherwise the select doesn't return anything)

这篇关于如何抓取仅在单击地图后在绘图中显示数据的Tableau仪表板?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆