无法使用 puppeteer 获取页面的完全加载的 html [英] Can't get the fully loaded html for a page using puppeteer

查看:121
本文介绍了无法使用 puppeteer 获取页面的完全加载的 html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取

解决方案

电子表格在 iframe 中,所以需要先获取 iframe:

const puppeteer = require('puppeteer');(异步() => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto(http://www.electproject.org/2016g", {超时:11000,等待:networkidle0",});等待 page.setViewport({宽度:640,身高:880,deviceScaleFactor: 1,});const 电子表格Frame = page.frames().find(框架 =>frame.url().startsWith('https://docs.google.com/spreadsheets/'));让电子表格头 = 等待电子表格Frame.evaluate(() =>document.body.querySelector('#top-bar').innerText);控制台日志(电子表格头);//2016 年 11 月大选:投票率等待 browser.close();})();

I'm trying to get the full html for this page. It has a spreadsheet that loads slowly. I'm able to get the spreadsheet included when taking a screenshot of the page. However I can't get the html for the spreadsheet. document.body.outerHTML excludes the html for the spreadsheet. It's as if puppeteer is still seeing the page before the spreadsheet loads.

How do I get the fully loaded HTML including the HTML for the spreadsheet?


(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto("http://www.electproject.org/2016g", {
    timeout: 11000,
    waitUntil: "networkidle0",
  });
  await page.setViewport({
    width: 640,
    height: 880,
    deviceScaleFactor: 1,
  });
  await page.screenshot({ path: "buddy-screenshot.png", format: "A4" }); // this screenshot displays the spreadsheet
  let html = await page.evaluate(() => document.body.outerHTML); // this returns the html excluding the spreadsheet
  await browser.close();
})();

解决方案

The spreadsheet is in an iframe, so you need to get the iframe first:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto("http://www.electproject.org/2016g", {
    timeout: 11000,
    waitUntil: "networkidle0",
  });
  await page.setViewport({
    width: 640,
    height: 880,
    deviceScaleFactor: 1,
  });

  const spreadsheetFrame = page.frames().find(
    frame => frame.url().startsWith('https://docs.google.com/spreadsheets/')
  );

  let spreadsheetHead = await spreadsheetFrame.evaluate(
    () => document.body.querySelector('#top-bar').innerText
  );

  console.log(spreadsheetHead); // 2016 November General Election : Turnout Rates

  await browser.close();
})();

这篇关于无法使用 puppeteer 获取页面的完全加载的 html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆