如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师) [英] How to get all html data after all scripts and page loading is done? (puppeteer)

查看：30 发布时间：2021/12/17 14:01:58 javascript node.js parsing web-scraping puppeteer

本文介绍了如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我终于知道如何使用 Node.js.安装了所有库/扩展.所以 puppeteer 正在工作，但就像以前的 Xmlhttp 一样......它只获取页面的模板/正文，没有所需的信息.几秒钟后页面上的所有脚本都在浏览器(Web 应用程序?)中打开它.加载整个页面后，我需要获取某些标签内的信息.另外，我会问，是否可以使用纯 JavaScript，因为我不使用 jQuery 之类的代码.所以对我来说难度加倍...

Finally I figured how to use Node.js. Installed all libraries/extensions. So puppeteer is working, but as it was previous with Xmlhttp... it gets only template/body of the page, without needed information. All scripts on the page engage after few second it had been opened in browser (Web app?). I need to get information inside certain tags after Whole page is loaded. Also, I would ask, if it possible to have pure JavaScript, because I do not use jQuery like code. So it doubles difficulty for me...

这是我目前所拥有的.

const puppeteer = require('puppeteer');
const $ = require('cheerio');
let browser;
let page;

const url = "really long link with latitude and attitude";

(async () => puppeteer
  .launch()
  .then(await function(browser) {
    return browser.newPage();
})
  .then(await function(page) {
    return page.goto(url).then(function() {
      return page.content();
    });
  })
  .then(await function(html) {
    $('strong', html).each(function() {
      console.log($(this).text());
    });
  })
  .catch(function(err) {
    //handle error
  }))();

我只在强标签中获得模板默认正文元素.但它应该包含比 10 个项目更多的数据.

I get only template default body elements inside strong tag. But it should contain a lot more data than just 10 items.

推荐答案

如果你想要和inspect一样的完整html?这是:

If you want full html same as inspect? Here it is:

    const puppeteer = require('puppeteer');

    (async function main() {
      try {
        const browser = await puppeteer.launch();
        const [page] = await browser.pages();

        await page.goto('https://example.org/', { waitUntil: 'networkidle0' });
        const data = await page.evaluate(() => document.querySelector('*').outerHTML);

        console.log(data);

        await browser.close();
      } catch (err) {
        console.error(err);
      }
    })();

这篇关于如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师) [英] How to get all html data after all scripts and page loading is done? (puppeteer)

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师) [英] How to get all html data after all scripts and page loading is done? (puppeteer)

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭