节点:无法刮擦公共Tableau仪表板? [英] Node: Can't scrape a public tableau dashboard?

查看:74
本文介绍了节点:无法刮擦公共Tableau仪表板?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我复制了问题中的步骤并将其转换到node.js

So I copied the steps from this question and converted it to node.js

...
app.use('/', async (request, response) => {
    const fetchedSite = await fetch('https://public.tableau.com/views/COVID-19CasesandDeathsinthePhilippines_15866705872710/Home?%3Aembed=y&%3AshowVizHome=no&%3Adisplay_count=y&%3Adisplay_static_image=y&%3AbootstrapWhenNotified=true&%3Alanguage=en&:embed=y&:showVizHome=n&:apiID=host0#navType=0&navSrc=Parse')
    const siteText = await fetchedSite.text()
    const $ = cheerio.load(siteText)
    const tsConfigJson = JSON.parse($('#tsConfigContainer').text())

    const body = {
        sheet_id: tsConfigJson.sheetId
    }

    const getTableauData = await fetch(`https://public.tableau.com${tsConfigJson.vizql_root}/bootstrapSession/sessions/${tsConfigJson.sessionid}`, {
        method: 'POST',
        body: JSON.stringify(body)
    })

    return response.status(200).send(getTableauData)
...

我得到的唯一答复是

{"size":0,"timeout":0}

状态:500

statusText:内部服务器错误

statusText: Internal Server Error

我在这里想念东西吗?

推荐答案

问题是您试图发送json,而它需要是表单数据:

The issue was that you've attempted to send json whereas it needs to be form data :

const body = new URLSearchParams();
body.append('sheet_id', tsConfigJson.sheetId);

const tableauData = await fetch(`https://public.tableau.com${tsConfigJson.vizql_root}/bootstrapSession/sessions/${tsConfigJson.sessionid}`, {
    method: 'POST',
    body: body
})

获取数据的完整代码:

const fetch = require('node-fetch');
const cheerio = require('cheerio');

const url = 'https://public.tableau.com/views/COVID-19CasesandDeathsinthePhilippines_15866705872710/Home?';
const params = new URLSearchParams({ 
    ":embed": "y",
    ":showVizHome": "no",
    ":display_count": "y",
    ":display_static_image": "y",
    ":bootstrapWhenNotified": true,
    ":language": "en",
    ":embed": "y",
    ":showVizHome": "n",
    ":apiID": "host0" 
});

(async () => {
    const site = await fetch(url + params);
    var text = await site.text();
    const $ = cheerio.load(text);
    const tsConfigJson = JSON.parse($('#tsConfigContainer').text());

    const body = new URLSearchParams();
    body.append('sheet_id', tsConfigJson.sheetId);

    const tableauData = await fetch(`https://public.tableau.com${tsConfigJson.vizql_root}/bootstrapSession/sessions/${tsConfigJson.sessionid}`, {
        method: 'POST',
        body: body
    })
    text = await tableauData.text();
    var jsonRegex = /\d+;({.*})\d+;({.*})/g;
    var match = jsonRegex.exec(text);
    const info = JSON.parse(match[1]);
    const data = JSON.parse(match[2]);
    console.log(data.secondaryInfo.presModelMap.dataDictionary.presModelHolder.genDataDictionaryPresModel.dataSegments["0"].dataColumns)
})();

在repl.it上尝试一下

这篇关于节点:无法刮擦公共Tableau仪表板?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆