Node Puppeteer, page.on( "request" ) 抛出一个"请求已经被处理了!"; [英] Node Puppeteer, page.on( "request" ) throw a "Request is already handled!"
问题描述
我正在使用 puppeteer-extra
和node.js 遍历多个 url.
I'm using puppeteer-extra
and node.js to iterate accross multiple urls.
我试图拦截一些资源类型以在每次迭代时加载,并收到以下错误.
I'm trying to intercept some resourceType to load upon each iteration, and getting the following error.
PS C:\Users\someuser\Desktop\Project> node temp.js
-- running
C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\assert.js:26
throw new Error(message);
^
Error: Request is already handled!
at Object.exports.assert (C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\assert.js:26:15)
at HTTPRequest.continue (C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\HTTPRequest.js:217:21)
at PuppeteerBlocker.onRequest (C:\Users\someuser\node_modules\@cliqz\adblocker-puppeteer\dist\cjs\adblocker.js:225:33)
at BlockingContext.onRequest (C:\Users\someuser\node_modules\@cliqz\adblocker-puppeteer\dist\cjs\adblocker.js:64:47)
at C:\Users\someuser\node_modules\puppeteer\lib\cjs\vendor\mitt\src\index.js:51:62
at Array.map (<anonymous>)
at Object.emit (C:\Users\someuser\node_modules\puppeteer\lib\cjs\vendor\mitt\src\index.js:51:43)
at Page.emit (C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\EventEmitter.js:72:22)
at C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:143:100
at C:\Users\someuser\node_modules\puppeteer\lib\cjs\vendor\mitt\src\index.js:51:62
我无法理解为什么在 for
循环中完成实际请求 page.goto
时,请求会被处理.有人有任何提示吗?
I'm having trouble understanding why the request would be already handled as the actual request page.goto
is done while in the for
loop. Would anyone one have any hints?
这是完整的项目
const puppeteer = require( 'puppeteer-extra' );
const StealthPlugin = require( 'puppeteer-extra-plugin-stealth' );
puppeteer.use( StealthPlugin() );
const AdblockerPlugin = require( 'puppeteer-extra-plugin-adblocker' );
puppeteer.use( AdblockerPlugin( { blockTrackers: true } ) );
puppeteer.launch( { headless: true } ).then( async browser => {
console.log( '--\xa0running' );
console.time( '--\xa0process' );
const page = await browser.newPage();
await page.setRequestInterception( true );
page.on( 'request', ( request ) => {
if ( [ 'image', 'stylesheet', 'font', 'script' ].indexOf( request.resourceType() ) ) {
request.abort();
} else {
request.continue();
};
} );
for ( var i = 1; i <= 20; i++ ) {
console.time( '--\xa0iteration\xa0' + i ); // ... timer start
await page.goto( 'https://www.someurl.it/shop/s%2D' + i, { waitUntil: 'load' } );
const title = await page.title();
console.log( title.includes( '404' ) ? false : title );
console.timeEnd( '--\xa0iteration\xa0' + i ); // ... timer end
};
await browser.close();
console.timeEnd( '--\xa0process' );
console.log( '--\xa0ending' );
} );
推荐答案
我已经找到了解决方案.
I've since find a solution.
我在名为 brewery
的主异步函数之外创建了一个常量来拦截请求,然后在主异步函数中我们只是等待我们的常量.
I'm creating a constant outside of the main async function called brewery
to intercept the request, then while in the main async function we simply await our constant.
/**
* Puppeteer, Headless Chrome Node.js API
*
* @link https://github.com/puppeteer/puppeteer
*
* @package npm install puppeteer
*/
const puppeteer = require( 'puppeteer' );
const brewery = async ( page ) => {
await page.setRequestInterception( true );
page.on( 'request', r => {
/**
* @see https://stackoverflow.com/a/47166637/3645650
*/
if ( [
'stylesheet',
'image',
'media',
'font',
'script',
'texttrack',
'xhr',
'fetch',
'eventsource',
'websocket',
'manifest',
'other',
].indexOf( r.resourceType() ) !== -1 ) {
r.abort();
} else {
r.continue();
};
} );
};
( async () => {
// ... start
let start = new Date();
console.log( '--\xa0process:\xa0start' );
const browser = await puppeteer.launch( {
headless: true
} );
const page = await browser.newPage();
await brewery( page );
await page.goto( 'https://github.com/login' );
await page.screenshot( { path: Date.now() + '.png' } );
console.log( '--\xa0process:\xa0screenshot' );
// ... end
await browser.close().then( () => {
var end = ( new Date() - start ) / 1000;
console.log( '--\xa0process:\xa0end,\xa0runtime\xa0' + end + '\xa0seconds' );
} );
} ) ()
这篇关于Node Puppeteer, page.on( "request" ) 抛出一个"请求已经被处理了!";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!