如何从 DOM 获取所有链接? [英] How to get all links from the DOM?

查看：74 发布时间：2021/6/23 19:00:12 javascript node.js web-crawler puppeteer headless-browser

本文介绍了如何从 DOM 获取所有链接?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

根据 https://github.com/GoogleChrome/puppeteer/issues/628，我应该能够从 < 获取所有链接带有这一行的 href="xyz">:

According to https://github.com/GoogleChrome/puppeteer/issues/628, I should be able to get all links from < a href="xyz" > with this single line:

const hrefs = await page.$$eval('a', a => a.href);

但是当我尝试一个简单的:

But when I try a simple:

console.log(hrefs)

我只得到:

http://example.de/index.html

... 作为输出，这意味着它只能找到 1 个链接?但该页面在源代码/DOM 中肯定有 12 个链接.为什么不能全部找到?

... as output which means that it could only find 1 link? But the page definitely has 12 links in the source code / DOM. Why does it fail to find them all?

最小示例:

'use strict';
const puppeteer = require('puppeteer');

crawlPage();

function crawlPage() {
    (async () => {
	
	const args = [
            "--disable-setuid-sandbox",
            "--no-sandbox",
            "--blink-settings=imagesEnabled=false",
        ];
        const options = {
            args,
            headless: true,
            ignoreHTTPSErrors: true,
        };

	const browser = await puppeteer.launch(options);
        const page = await browser.newPage();
	await page.goto("http://example.de", {
            waitUntil: 'networkidle2',
            timeout: 30000
        });
     
	const hrefs = await page.$eval('a', a => a.href);
        console.log(hrefs);
		
        await page.close();
	await browser.close();
		
    })().catch((error) => {
        console.error(error);
    });;

}

推荐答案

在您的示例代码中，您使用的是 page.$eval，而不是 page.$$eval.由于前者使用 document.querySelector 而不是 document.querySelectorAll，因此您描述的行为是预期的.

In your example code you're using page.$eval, not page.$$eval. Since the former uses document.querySelector instead of document.querySelectorAll, the behaviour you describe is the expected one.

此外，您应该在 $$eval 参数中更改您的 pageFunction:

Also, you should change your pageFunctionin the $$eval arguments:

const hrefs = await page.$$eval('a', as => as.map(a => a.href));

这篇关于如何从 DOM 获取所有链接?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从 DOM 获取所有链接? [英] How to get all links from the DOM?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何从 DOM 获取所有链接? [英] How to get all links from the DOM?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭