使用 Puppeteer 在循环中抓取多个 URL [英] Crawling multiple URLs in a loop using Puppeteer
问题描述
我有一组 URL 可以从中抓取数据:
I have an array of URLs to scrape data from:
urls = ['url','url','url'...]
这就是我正在做的:
urls.map(async (url)=>{
await page.goto(url);
await page.waitForNavigation({ waitUntil: 'networkidle' });
})
这似乎不会等待页面加载并很快访问所有 URL(我什至尝试使用 page.waitFor
).
This seems to not wait for page load and visits all the URLs quite rapidly (I even tried using page.waitFor
).
我想知道我是否在做一些根本错误的事情,或者不建议/不支持此类功能.
I wanted to know if am I doing something fundamentally wrong or this type of functionality is not advised/supported.
推荐答案
map
、forEach
、reduce
等,不等待在它们进行迭代器的下一个元素之前,它们内部的异步操作.
map
, forEach
, reduce
, etc, does not wait for the asynchronous operation within them, before they proceed to the next element of the iterator they are iterating over.
在执行异步操作时同步遍历迭代器的每个项目有多种方法,但我认为在这种情况下最简单的方法是简单地使用普通的 for
运算符,它会等待以完成操作.
There are multiple ways of going through each item of an iterator synchronously while performing an asynchronous operation, but the easiest in this case I think would be to simply use a normal for
operator, which does wait for the operation to finish.
const urls = [...]
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
await page.goto(`${url}`);
await page.waitForNavigation({ waitUntil: 'networkidle2' });
}
这将访问一个又一个的 url,正如您所期望的.如果您对使用 await/async 进行串行迭代感到好奇,您可以看看这个答案:https://stackoverflow.com/a/24586168/791691
This would visit one url after another, as you are expecting. If you are curious about iterating serially using await/async, you can have a peek at this answer: https://stackoverflow.com/a/24586168/791691
这篇关于使用 Puppeteer 在循环中抓取多个 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!