无法使用 puppeteer 单击不同的链接 [英] Trouble clicking on different links using puppeteer

查看:55
本文介绍了无法使用 puppeteer 单击不同的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 puppeteer 在 node 中编写了小脚本,以循环点击来自

有多个与此类似的问题.如果您想了解更多信息,我会参考它们.

I've written tiny scripts in node using puppeteer to perform clicks cyclically on the link of different posts from it's landing page of a website.

The site link used within my scripts is a placeholder. Moreover, they are not dynamic. So, puppeteer might be overkill. However, My intention is to learn the logic of clicking.

When I execute my first script, It clicks once and throws the following error as it goes out of the source.

const puppeteer = require("puppeteer");

(async () => {
    const browser = await puppeteer.launch({headless:false});
    const [page] = await browser.pages();
    await page.goto("https://stackoverflow.com/questions/tagged/web-scraping",{waitUntil:'networkidle2'});
    await page.waitFor(".summary");
    const sections = await page.$$(".summary");

    for (const section of sections) {
        await section.$eval(".question-hyperlink", el => el.click())
    }

    await browser.close();
})();

The error the above script encounters:

(node:9944) UnhandledPromiseRejectionWarning: Error: Execution context was destroyed, most likely because of a navigation.

When I execute the following, the script pretends to click once (in reality it is not) and encounters the same error as earlier.

const puppeteer = require("puppeteer");

(async () => {
    const browser = await puppeteer.launch({headless:false});
    const [page] = await browser.pages();
    await page.goto("https://stackoverflow.com/questions/tagged/web-scraping");

    await page.waitFor(".summary .question-hyperlink");
    const sections = await page.$$(".summary .question-hyperlink");

    for (let i=0, lngth = sections.length; i < lngth; i++) {
        await sections[i].click();
    }

    await browser.close();
})();

The error the above one throws:

(node:10128) UnhandledPromiseRejectionWarning: Error: Execution context was destroyed, most likely because of a navigation.

How can I let my script perform clicks cyclically?

解决方案

Problem:

Execution context was destroyed, most likely because of a navigation.

The error says you wanted to click some link, or do something on some page which does not exist anymore, most likely because of you navigated away.

Logic:

Think of the puppeteer script as a real human browsing the real page.

First, we load the url (https://stackoverflow.com/questions/tagged/web-scraping).

Next, we want to go through all questions asked on that page. To do that what would we normally do? We would do either of the following,

  • Open one link in a new tab. Focus on that new tab, finish our work and come back to the original tab. Continue next link.
  • We click on a link, do our stuff, go back to previous page, continue next one.

So both of them involves moving away from and coming back to current page.

If you don't follow this flow, you will get the error message as above.

Solution

There are at least 4 or more ways to resolve this. I will go with the simplest and complex ones.

Way: Link Extraction

First we extract all links on current page.

const links = await page.$$eval(".hyperlink", element => element.href);

This gives us a list of url. We can create a new tab for each link.

for(let link of links){
  const newTab = await browser.newPage();
  await newTab.goto(link);
  // do the stuff
  await newTab.close();
}

This will go through each link one by one. We could improve this by using promise.map and various queue libraries, but you get the idea.

Way: Coming back to main page

We will need to store the state somehow so we can know which link we visited last time. If we visited third question and came back to tag page, we need to visit the 4th question next time and vice versa.

Check the following code.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  await page.goto(
    `https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&pagesize=15`
  );

  const visitLink = async (index = 0) => {
    await page.waitFor("div.summary > h3 > a");

    // extract the links to click, we need this every time
    // because the context will be destryoed once we navigate
    const links = await page.$$("div.summary > h3 > a");
    // assuming there are 15 questions on one page,
    // we will stop on 16th question, since that does not exist
    if (links[index]) {
      console.log("Clicking ", index);

      await Promise.all([

        // so, start with the first link
        await page.evaluate(element => {
          element.click();
        }, links[index]),

        // either make sure we are on the correct page due to navigation
        await page.waitForNavigation(),
        // or wait for the post data as well
        await page.waitFor(".post-text")
      ]);

      const currentPage = await page.title();
      console.log(index, currentPage);

      // go back and visit next link
      await page.goBack({ waitUntil: "networkidle0" });
      return visitLink(index + 1);
    }
    console.log("No links left to click");
  };

  await visitLink();

  await browser.close();
})();

Result:

EDIT: There are multiple questions similar to this one. I will be referencing them in case you want to learn more.

这篇关于无法使用 puppeteer 单击不同的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆