伪造者-更新抓取网站的内容 [英] Puppeteer - update content of a scraped website
问题描述
我想使用nodejs创建一个cli脚本,以抓取不提供api的运动成绩网站的内容.我知道如何管理内容抓取,但是我对此表示怀疑.如果结果发生变化,可以更新抓取的内容并将其显示在表格内的终端窗口中吗?
I want to create a cli script using nodejs to scrape the content of a sports results website that does not provide an api. I know how to manage the content scraping, but I have a doubt about. Is possible to update the scraped content if a result will change and display it inside a table into the terminal window??
推荐答案
这里是一个简化的示例.该脚本打开 https://time.is/,并在每次更改站点时钟时将时间记录到控制台元素.它使用 page.exposeFunction()代码>
和 MutationObserver
.
Here is a simplified example. The script opens https://time.is/ and logs the time to the console on each change of the site clock element. It uses page.exposeFunction()
and MutationObserver
.
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://time.is/');
await page.exposeFunction('updateTime', updateTime);
await page.evaluate(() => {
const clock = document.querySelector('#clock0_bg');
const config = { subtree: true, childList: true, attributes: true, characterData: true };
const callback = function () { window.updateTime(clock.innerText); };
const observer = new MutationObserver(callback);
observer.observe(clock, config);
});
} catch (err) { console.error(err); }
function updateTime(time) {
console.log(time);
}
这篇关于伪造者-更新抓取网站的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!