如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师) [英] How to get all html data after all scripts and page loading is done? (puppeteer)

查看:30
本文介绍了如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我终于知道如何使用 Node.js.安装了所有库/扩展.所以 puppeteer 正在工作,但就像以前的 Xmlhttp 一样......它只获取页面的模板/正文,没有所需的信息.几秒钟后页面上的所有脚本都在浏览器(Web 应用程序?)中打开它.加载整个页面后,我需要获取某些标签内的信息.另外,我会问,是否可以使用纯 JavaScript,因为我不使用 jQuery 之类的代码.所以对我来说难度加倍...

Finally I figured how to use Node.js. Installed all libraries/extensions. So puppeteer is working, but as it was previous with Xmlhttp... it gets only template/body of the page, without needed information. All scripts on the page engage after few second it had been opened in browser (Web app?). I need to get information inside certain tags after Whole page is loaded. Also, I would ask, if it possible to have pure JavaScript, because I do not use jQuery like code. So it doubles difficulty for me...

这是我目前所拥有的.

const puppeteer = require('puppeteer');
const $ = require('cheerio');
let browser;
let page;

const url = "really long link with latitude and attitude";

(async () => puppeteer
  .launch()
  .then(await function(browser) {
    return browser.newPage();
})
  .then(await function(page) {
    return page.goto(url).then(function() {
      return page.content();
    });
  })
  .then(await function(html) {
    $('strong', html).each(function() {
      console.log($(this).text());
    });
  })
  .catch(function(err) {
    //handle error
  }))();

我只在强标签中获得模板默认正文元素.但它应该包含比 10 个项目更多的数据.

I get only template default body elements inside strong tag. But it should contain a lot more data than just 10 items.

推荐答案

如果你想要和inspect一样的完整html?这是:

If you want full html same as inspect? Here it is:

    const puppeteer = require('puppeteer');

    (async function main() {
      try {
        const browser = await puppeteer.launch();
        const [page] = await browser.pages();

        await page.goto('https://example.org/', { waitUntil: 'networkidle0' });
        const data = await page.evaluate(() => document.querySelector('*').outerHTML);

        console.log(data);

        await browser.close();
      } catch (err) {
        console.error(err);
      }
    })();

这篇关于如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆