伪造者-更新抓取网站的内容 [英] Puppeteer - update content of a scraped website

查看：68 发布时间：2021/4/23 19:36:41 javascript node.js web-scraping command-line-interface puppeteer

本文介绍了伪造者-更新抓取网站的内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用nodejs创建一个cli脚本，以抓取不提供api的运动成绩网站的内容.我知道如何管理内容抓取，但是我对此表示怀疑.如果结果发生变化，可以更新抓取的内容并将其显示在表格内的终端窗口中吗?

I want to create a cli script using nodejs to scrape the content of a sports results website that does not provide an api. I know how to manage the content scraping, but I have a doubt about. Is possible to update the scraped content if a result will change and display it inside a table into the terminal window??

推荐答案

这里是一个简化的示例.该脚本打开 https://time.is/，并在每次更改站点时钟时将时间记录到控制台元素.它使用 page.exposeFunction() 和 MutationObserver .


Here is a simplified example. The script opens https://time.is/ and logs the time to the console on each change of the site clock element. It uses page.exposeFunction() and MutationObserver.
import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();

try {
  const [page] = await browser.pages();

  await page.goto('https://time.is/');
  await page.exposeFunction('updateTime', updateTime);

  await page.evaluate(() => {
    const clock = document.querySelector('#clock0_bg');

    const config = { subtree: true, childList: true, attributes: true, characterData: true };
    const callback = function () { window.updateTime(clock.innerText); };
    const observer = new MutationObserver(callback);
    observer.observe(clock, config);
  });
} catch (err) { console.error(err); }

function updateTime(time) {
  console.log(time);
}


                        这篇关于伪造者-更新抓取网站的内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

伪造者-更新抓取网站的内容 [英] Puppeteer - update content of a scraped website

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

伪造者-更新抓取网站的内容 [英] Puppeteer - update content of a scraped website

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭