使用Puppeteer选择href属性人 [英] Selecting href attributers with Puppeteer
问题描述
我正在尝试从此中提取一些 urls
I am trying to extract a few urls
from this page with Puppeteer.
但是我返回的所有脚本都是 undefined
However all my script is returning is undefined
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.goto('https://divisare.com/');
let projects = await page.evaluate((sel) => {
return document.getElementsByClassName(sel)
}, 'homepage-project-image');
var aNode = projects[0].href;
console.log(aNode);
console.log(projects.length)
browser.close();
}
run();
但是当我运行以下内容时,我至少能够获得正确的链接数我正在尝试提取。
However when I run something like the below I am at least able to get the proper count of the links I am trying to extract.
let projects = await page.evaluate((sel) => {
return document.getElementsByClassName(sel).length
}, 'homepage-project-image');
console.log(projects);
我想访问我的项目
HTMLCollection
错误?我在这里想念什么?谢谢。
Am I trying to access my projects
HTMLCollection
incorrectly? What am I missing here? Thanks.
推荐答案
木偶无法从 evaluate
语句返回不可序列化的值(请参见此问题和以下 PR )
Puppeteer cannot return non-serialisable value from evaluate
statement (see this issue and the following PR)
一种解决方法是:
let projects = await page.evaluate((sel) => {
return document.getElementsByClassName(sel)[0].href;
}, 'homepage-project-image');
请记住 document.getElementsByClassName
返回 HTMLCollection
,因此,如果要遍历结果,则需要类似以下内容:
Remember that document.getElementsByClassName
returns HTMLCollection
, so if you want to iterate over the results you need something like:
let projects = await page.evaluate((sel) => {
return Array.from(document.getElementsByClassName(sel)).map(node => node.href);
}, 'homepage-project-image');
这篇关于使用Puppeteer选择href属性人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!