如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师) [英] How to get all html data after all scripts and page loading is done? (puppeteer)
问题描述
我终于知道如何使用 Node.js.安装了所有库/扩展.所以 puppeteer 正在工作,但就像以前的 Xmlhttp 一样......它只获取页面的模板/正文,没有所需的信息.几秒钟后页面上的所有脚本都在浏览器(Web 应用程序?)中打开它.加载整个页面后,我需要获取某些标签内的信息.另外,我会问,是否可以使用纯 JavaScript,因为我不使用 jQuery 之类的代码.所以对我来说难度加倍...
Finally I figured how to use Node.js. Installed all libraries/extensions. So puppeteer is working, but as it was previous with Xmlhttp... it gets only template/body of the page, without needed information. All scripts on the page engage after few second it had been opened in browser (Web app?). I need to get information inside certain tags after Whole page is loaded. Also, I would ask, if it possible to have pure JavaScript, because I do not use jQuery like code. So it doubles difficulty for me...
这是我目前所拥有的.
const puppeteer = require('puppeteer');
const $ = require('cheerio');
let browser;
let page;
const url = "really long link with latitude and attitude";
(async () => puppeteer
.launch()
.then(await function(browser) {
return browser.newPage();
})
.then(await function(page) {
return page.goto(url).then(function() {
return page.content();
});
})
.then(await function(html) {
$('strong', html).each(function() {
console.log($(this).text());
});
})
.catch(function(err) {
//handle error
}))();
我只在强标签中获得模板默认正文元素.但它应该包含比 10 个项目更多的数据.
I get only template default body elements inside strong tag. But it should contain a lot more data than just 10 items.
推荐答案
如果你想要和inspect一样的完整html?这是:
If you want full html same as inspect? Here it is:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://example.org/', { waitUntil: 'networkidle0' });
const data = await page.evaluate(() => document.querySelector('*').outerHTML);
console.log(data);
await browser.close();
} catch (err) {
console.error(err);
}
})();
这篇关于如何在所有脚本和页面加载完成后获取所有 html 数据?(傀儡师)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!