如何从 DOM 获取所有链接? [英] How to get all links from the DOM?
问题描述
根据 https://github.com/GoogleChrome/puppeteer/issues/628,我应该能够从 < 获取所有链接带有这一行的 href="xyz">:
According to https://github.com/GoogleChrome/puppeteer/issues/628, I should be able to get all links from < a href="xyz" > with this single line:
const hrefs = await page.$$eval('a', a => a.href);
但是当我尝试一个简单的:
But when I try a simple:
console.log(hrefs)
我只得到:
http://example.de/index.html
... 作为输出,这意味着它只能找到 1 个链接?但该页面在源代码/DOM 中肯定有 12 个链接.为什么不能全部找到?
... as output which means that it could only find 1 link? But the page definitely has 12 links in the source code / DOM. Why does it fail to find them all?
最小示例:
'use strict';
const puppeteer = require('puppeteer');
crawlPage();
function crawlPage() {
(async () => {
const args = [
"--disable-setuid-sandbox",
"--no-sandbox",
"--blink-settings=imagesEnabled=false",
];
const options = {
args,
headless: true,
ignoreHTTPSErrors: true,
};
const browser = await puppeteer.launch(options);
const page = await browser.newPage();
await page.goto("http://example.de", {
waitUntil: 'networkidle2',
timeout: 30000
});
const hrefs = await page.$eval('a', a => a.href);
console.log(hrefs);
await page.close();
await browser.close();
})().catch((error) => {
console.error(error);
});;
}
推荐答案
在您的示例代码中,您使用的是 page.$eval
,而不是 page.$$eval
.由于前者使用 document.querySelector
而不是 document.querySelectorAll
,因此您描述的行为是预期的.
In your example code you're using page.$eval
, not page.$$eval
. Since the former uses document.querySelector
instead of document.querySelectorAll
, the behaviour you describe is the expected one.
此外,您应该在 $$eval
参数中更改您的 pageFunction
:
Also, you should change your pageFunction
in the $$eval
arguments:
const hrefs = await page.$$eval('a', as => as.map(a => a.href));
这篇关于如何从 DOM 获取所有链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!