Node JS Pupteer 无限滚动循环 [英] Node JS Puppteer Infinite scroll loop

查看:60
本文介绍了Node JS Pupteer 无限滚动循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习 Puppeteer &试图抓取一个实施了无限滚动的网站.通过在延迟 1 秒后向下滚动,我能够从列表中获取所有价格.以获得更好的观看质量.

const delay = d=>new Promise(r=>setTimeout(r,d))const scrollAndRemove = async() =>{//滚动到顶部触发滚动事件window.scrollTo(0, 0);const 选择器 = `.title_9ddaf`;const element = document.querySelector(selector);//如果没有元素就停止如果(元素){element.scrollIntoView();//做我的动作//稍等片刻,减少加载或延迟加载图片等待延迟(1000);console.log(element.innerText);//结束我的行动//移除元素以在某处触发一些滚动事件element.remove();//返回另一个承诺返回 scrollAndRemove()}}scrollAndRemove();

I am learning Puppeteer & trying to scrape a website that has infinite scroll implemented. I am able to get all the Prices from the list, by scrolling down after a delay of 1 second. Here is the URL

What I want to do is, open a item from the list, get the product name, go back to the list, select the second product and do this for all products.

const fs = require('fs');
const puppeteer = require('puppeteer');
function extractItems() {
  const extractedElements = document.querySelectorAll('.price');
  const items = [];
  for (let element of extractedElements) {
    items.push(element.innerText);
  }
  return items;
}
async function scrapeInfiniteScrollItems(
  page,
  extractItems,
  itemTargetCount,
  scrollDelay = 1000,
) {
  let items = [];
  try {
    let previousHeight;
    while (items.length < itemTargetCount) {
      items = await page.evaluate(extractItems);
      previousHeight = await page.evaluate('document.body.scrollHeight');
      await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
      await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`);
      await page.waitFor(scrollDelay);
    }
  } catch(e) { }
  return items;
}
(async () => {
  // Set up browser and page.
  const browser = await puppeteer.launch({
    headless: false,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });
  const page = await browser.newPage();
  page.setViewport({ width: 1280, height: 926 });
  // Navigate to the demo page.
  await page.goto('https://www.clubfactory.com/views/product.html?categoryId=53&subId=53&filter=%7B%22Price%22%3A%5B%7B%22beg%22%3A1.32%2C%22end%22%3A0%7D%5D%7D');
  // Scroll and extract items from the page.
  const items = await scrapeInfiniteScrollItems(page, extractItems, 4000);
  // Save extracted items to a file.
  fs.writeFileSync('./prices3.txt', items.join('\n') + '\n');
  // Close the browser.
  await browser.close();
})(); 

Any help is appreciated

解决方案

EDIT: I added a working snippet for the particular website listed on the question.

If you are into scraping, sometimes you must break the user experience down to little bits to mimic a real user to get what actual data that the user would get.

One easy way to deal with infinite scrolling is to remove all current elements, and scroll until there are another 10 or 100 new elements each time, or even trying to scrape all at once.

But you can also think another way,

  1. get the first element,
  2. click to open in new tab,
  3. parse the data,
  4. close tab,
  5. remove the element,
  6. and move on to next element. Scroll and wait till new element comes.

The problem with the concept is, you will never know how the scrolling and clicking is getting triggered. There can be multiple events bound to scrolling to deal with it in different sites. And, the provided site is in vueJS.

Code Snippet

The selector for each product is #__layout > section > main > section > section > div.products > div > div.

We will scroll the selector, deal with it, then remove it. Afterwards we will trigger a scroll event so the browser knows something has changed.

window.scrollTo(0, 0);
const selector = `#__layout > section > main > section > section > div.products > div > div`;
const element = document.querySelector(selector)
element.scrollIntoView()
element.remove()

Result: (gif animation)

What's cool is, we do not need to scroll to the bottom of the page to trigger the change. Look how the scrollbar changes during the removal.

This works on sites like producthunt as well. Video Link for better quality view.

const delay = d=>new Promise(r=>setTimeout(r,d))

const scrollAndRemove = async () => {
    // scroll to top to trigger the scroll events
    window.scrollTo(0, 0);
    const selector = `.title_9ddaf`;
    const element = document.querySelector(selector);

    // stop if there are no elements left
    if(element){
      element.scrollIntoView();

      // do my action
      // wait for a moment to reduce load or lazy loading image
      await delay(1000);
      console.log(element.innerText);
      // end of my action

      // remove the element to trigger some scroll event somewhere
      element.remove();

      // return another promise
      return scrollAndRemove()
    }
}

scrollAndRemove();

这篇关于Node JS Pupteer 无限滚动循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆