在Puppeteer中获取与XPath的所有链接(暂停还是无法正常工作)? [英] Get all links with XPath in Puppeteer (pausing or not working)?

查看:302
本文介绍了在Puppeteer中获取与XPath的所有链接(暂停还是无法正常工作)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用XPaths选择页面上的所有链接,然后我的Puppeteer应用程序才能单击并执行一些操作.我发现该方法(下面的代码)有时会卡住,并且我的搜寻器将暂停.是否有更好/不同的方式来从XPath获取所有链接?还是我的代码中有不正确的东西,可能会暂停我的应用程序的进度?

I am required to use XPaths to select all links on a page, for then my Puppeteer app to click into and perform some actions. I am finding that the method (code below) is getting stuck sometimes and my crawler will be paused. Is there a better/different way of getting all links from an XPath? Or is there something in my code that is incorrect and could be pausing my app's progress?

try {
  links = await this.getLinksFromXPathSelector(state);
} catch (e) {
  console.log("error getting links");
  return {...state, error: e};
}

哪个电话:

async getLinksFromXPathSelector(state) {
 const newPage = state.page
 // console.log('links selector');
 const links = await newPage.evaluate((mySelector) => {
   let results = [];
   let query = document.evaluate(mySelector,
     document,
     null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
   for (let i=0, length=query.snapshotLength; i<length; ++i) {
     results.push(query.snapshotItem(i).href);
   }
   return results;
 }, state.linksSelector);
  return links;
}

XPath在state.linksSelector中.

推荐答案

您可以使用 page.$x() 来评估XPath表达式并获取 page.waitForXPath() 预先确保将XPath字符串指定的元素添加到DOM.

You can use page.$x() to evaluate an XPath expression and obtain an ElementHandle array. It may be appropriate to use page.waitForXPath() beforehand to ensure that the elements specified by XPath string are added to the DOM.

然后,您可以传递 ElementHandle 通过 并返回一个包含 href 每个元素的属性值.

Then you can pass the ElementHandle array elements to the page context via page.evaluate() and return an array containing the href attribute values for each element.

const xpath_expression = '//a[@href]';
await page.waitForXPath(xpath_expression);
const links = await page.$x(xpath_expression);
const link_urls = await page.evaluate((...links) => {
  return links.map(e => e.href);
}, ...links);

console.log(link_urls);

这篇关于在Puppeteer中获取与XPath的所有链接(暂停还是无法正常工作)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆