Node Puppeteer, page.on( "request" ) 抛出一个"请求已经被处理了!"; [英] Node Puppeteer, page.on( "request" ) throw a "Request is already handled!"

查看:795
本文介绍了Node Puppeteer, page.on( "request" ) 抛出一个"请求已经被处理了!";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 puppeteer-extra 和node.js 遍历多个 url.

I'm using puppeteer-extra and node.js to iterate accross multiple urls.

我试图拦截一些资源类型以在每次迭代时加载,并收到以下错误.

I'm trying to intercept some resourceType to load upon each iteration, and getting the following error.

PS C:\Users\someuser\Desktop\Project> node temp.js
-- running
C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\assert.js:26
        throw new Error(message);
              ^

Error: Request is already handled!
    at Object.exports.assert (C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\assert.js:26:15)
    at HTTPRequest.continue (C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\HTTPRequest.js:217:21)
    at PuppeteerBlocker.onRequest (C:\Users\someuser\node_modules\@cliqz\adblocker-puppeteer\dist\cjs\adblocker.js:225:33)
    at BlockingContext.onRequest (C:\Users\someuser\node_modules\@cliqz\adblocker-puppeteer\dist\cjs\adblocker.js:64:47)
    at C:\Users\someuser\node_modules\puppeteer\lib\cjs\vendor\mitt\src\index.js:51:62
    at Array.map (<anonymous>)
    at Object.emit (C:\Users\someuser\node_modules\puppeteer\lib\cjs\vendor\mitt\src\index.js:51:43)
    at Page.emit (C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\EventEmitter.js:72:22)
    at C:\Users\someuser\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:143:100
    at C:\Users\someuser\node_modules\puppeteer\lib\cjs\vendor\mitt\src\index.js:51:62

我无法理解为什么在 for 循环中完成实际请求 page.goto 时,请求会被处理.有人有任何提示吗?

I'm having trouble understanding why the request would be already handled as the actual request page.goto is done while in the for loop. Would anyone one have any hints?

这是完整的项目

const puppeteer = require( 'puppeteer-extra' );

const StealthPlugin = require( 'puppeteer-extra-plugin-stealth' );
puppeteer.use( StealthPlugin() );

const AdblockerPlugin = require( 'puppeteer-extra-plugin-adblocker' );
puppeteer.use( AdblockerPlugin( { blockTrackers: true } ) );

puppeteer.launch( { headless: true } ).then( async browser => {

    console.log( '--\xa0running' );

    console.time( '--\xa0process' );

    const page = await browser.newPage();

    await page.setRequestInterception( true );
    
    page.on( 'request', ( request ) => {
        if ( [ 'image', 'stylesheet', 'font', 'script' ].indexOf( request.resourceType() ) ) {
            request.abort();
        } else {
            request.continue();
        };
    } );

    for ( var i = 1; i <= 20; i++ ) {

        console.time( '--\xa0iteration\xa0' + i ); // ... timer start 
    
        await page.goto( 'https://www.someurl.it/shop/s%2D' + i, { waitUntil: 'load' } );
    
        const title = await page.title();
    
        console.log( title.includes( '404' ) ? false : title );
    
        console.timeEnd( '--\xa0iteration\xa0' + i ); // ... timer end 
    
    };

    await browser.close();

    console.timeEnd( '--\xa0process' );
  
    console.log( '--\xa0ending' );

} );

推荐答案

我已经找到了解决方案.

I've since find a solution.

我在名为 brewery 的主异步函数之外创建了一个常量来拦截请求,然后在主异步函数中我们只是等待我们的常量.

I'm creating a constant outside of the main async function called brewery to intercept the request, then while in the main async function we simply await our constant.

/**
 * Puppeteer, Headless Chrome Node.js API
 * 
 * @link https://github.com/puppeteer/puppeteer
 * 
 * @package npm install puppeteer
 */
const puppeteer = require( 'puppeteer' );

const brewery = async ( page ) => {

    await page.setRequestInterception( true );

    page.on( 'request', r => {

        /**
         * @see https://stackoverflow.com/a/47166637/3645650
         */
        if ( [
            'stylesheet', 
            'image', 
            'media', 
            'font', 
            'script', 
            'texttrack', 
            'xhr', 
            'fetch', 
            'eventsource', 
            'websocket', 
            'manifest', 
            'other',
        ].indexOf( r.resourceType() ) !== -1 ) {

            r.abort();

        } else {

            r.continue();

        };

    } );

};

( async () => {

    // ... start
    let start = new Date();
    console.log( '--\xa0process:\xa0start' );

    const browser = await puppeteer.launch( { 
        headless: true 
    } );

    const page = await browser.newPage();
    
    await brewery( page );

    await page.goto( 'https://github.com/login' );
    await page.screenshot( { path: Date.now() + '.png' } );
    console.log( '--\xa0process:\xa0screenshot' );

    // ... end
    await browser.close().then( () => {
        var end = ( new Date() - start ) / 1000;
        console.log( '--\xa0process:\xa0end,\xa0runtime\xa0' + end + '\xa0seconds' );
    } );  

} ) ()

这篇关于Node Puppeteer, page.on( "request" ) 抛出一个"请求已经被处理了!";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆