puppeteer 被重定向时浏览器不是 [英] puppeteer being redirected when browser is not

查看:120
本文介绍了puppeteer 被重定向时浏览器不是的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正在尝试测试页面

我会尝试调查这个奇怪的请求是否合法以及为什么它会在 chrome puppeteer 上重定向

这个帖子可能会有所帮助,可能有一些与铬相关的内容被视为不安全

我也尝试将 args: ['--disable-web-security', '--allow-running-insecure-content'] 传递给 launch() 对象参数,但没有结果

请告诉我们进展如何!发现 Har 很有趣!

Attempting to test page https://publicindex.sccourts.org/anderson/publicindex/ When navigating with standard browser to the page, the navigation ends at the requested page (https://publicindex.sccourts.org/anderson/publicindex/) with the page displaying an "accept" button.

However, when testing with puppeteer in headless mode, the request is redirected to https://publicindex.sccourts.org.

I have a rough idea of what is occuring, but can not seem to prevent the redirection to https://publicindex.sccourts.org when the page is requested using puppeteer. here is what I believe is occuring with the user controlled browser:

  1. request for page is sent. (assuming first visit)

  2. the response is pure JS,

  3. The js code specifies to:

    copy the initial page request headers

    add a specific header, and re-request the same page (xhr)

    copies a url from one of the response headers and replaces the location

    (or)

    checks the page history,

    adds the url from the response to page to history,

    opens a new window,

    writes the xhr response to the new page

    closes the new window

    adds an event listener for a function in the returned xhr request

    fires the event

With puppeteer I have tried tracing the js, recording har, monitoring cookies, watching the request chain, intercepting page requests and adjusting headers,watching history....etc. I'm stumped.
Here is the most basic version of the puppeteer script:

function run () {
    let url = 'https://publicindex.sccourts.org/anderson/publicindex/';
    const puppeteer = require('puppeteer');
    const PuppeteerHar = require('puppeteer-har');
    puppeteer.launch({headless: true}).then(async browser => {
        const page = await browser.newPage();
        await page.setJavaScriptEnabled(true);
        await page.setViewport({width: 1920, height: 1280});
        await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
        const har = new PuppeteerHar(page);
        await har.start({path: 'results.har'});
        const response = await page.goto(url);
        await page.waitForNavigation();
        await har.stop();
        let bodyHTML = await page.content();
        console.log(bodyHTML);
    });
};
run();

why can I not get puppeteer to simply replicate the process that is being executed by js when I am navigating to the page in chrome, and end navigation on the "accept" page?

here is a version with more verbose logging:

function run () {
    let url = 'https://publicindex.sccourts.org/anderson/publicindex/';
    const puppeteer = require('puppeteer');
    const PuppeteerHar = require('puppeteer-har');
    puppeteer.launch().then(async browser => {

        const page = await browser.newPage();

        await page.setJavaScriptEnabled(true);
        await page.setViewport({width:1920,height:1280});
        await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
        await page.setRequestInterception(true);
        page.on('frameattached', frame =>{ console.log('frame attached ');});
        page.on('framedetached', frame =>{ console.log('frame detached ');});
        page.on('framenavigated', frame =>{ console.log('frame navigated '); });
        page.on('requestfailed', req =>{ console.log('request failed ');});
        page.on('requestfinished', req =>{ console.log('frame finished  '); console.log(req.url())});

        let count = 0;
        let headers = '';
            page.on('request', interceptedRequest => {
                console.log('requesting ' + count + 'times');
                console.log('request for  ' + interceptedRequest.url());
                console.log(interceptedRequest);
                if (count>2) {
                    interceptedRequest.abort();
                    return;
                }
                if (interceptedRequest.url() == url) {
                    count++;
                    if (count == 1) {
                        const headers = interceptedRequest.headers();
                        headers['authority'] = 'publicindex.sccourts.org';
                        headers['sec-fetch-dest'] = 'empty';
                        headers['sec-fetch-mode'] = 'cors';
                        headers['sec-fetch-site'] = 'same-origin';
                        headers['upgrade-insecure-requests'] = '1';
                        interceptedRequest.continue({headers});
                        return;
                    } else {
                        interceptedRequest.continue();
                        return;
                    }

                }
                count++;
                interceptedRequest.continue();
                return;
            });
            const har = new PuppeteerHar(page);
            await har.start({ path: 'results.har' });
            await page.tracing.start({path: 'trace.json'});
            await Promise.all([page.coverage.startJSCoverage({reportAnonymousScripts  : true})]);
            const response = await page.goto(url);
             const session = await page.target().createCDPSession();
             await session.send('Page.enable');
            await session.send('Page.setWebLifecycleState', {state: 'active'});
            const jsCoverage = await Promise.all([page.coverage.stopJSCoverage()]);
            console.log(jsCoverage);
            const chain = response.request().redirectChain();
            console.log(chain + "\n\n");
        await page.waitForNavigation();
        await har.stop();
        let bodyHTML = await page.content();
        console.log(bodyHTML);

    });
};

run();

解决方案

I don't have a full resolution but I know where the redirection is happening.

I tested your script locally with below:

const puppeteer = require('puppeteer');
const PuppeteerHar = require('puppeteer-har');

function run () {
    let url = 'https://publicindex.sccourts.org/anderson/publicindex/';
    puppeteer.launch({headless: false, devtools: true }).then(async browser => {
        const page = await browser.newPage();
        await page.setRequestInterception(true);
        page.on('request', request => {
            console.log('GOT NEW REQUEST', request.url());
            request.continue();
        });

        page.on('response', response => {
            console.log('GOT NEW RESPONSE', response.status(), response.headers());
        });
        await page.setJavaScriptEnabled(true);
        await page.setViewport({width: 1920, height: 1280});
        await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
        const har = new PuppeteerHar(page);
        await har.start({path: 'results.har'});
        const response = await page.goto(url);
        await page.waitForNavigation();
        await har.stop();
        let bodyHTML = await page.content();
    });
};
run();

I edited three parts:

  • Removed headless mode and open the devtools automatically
  • Intercept all network requests (that I audited)
  • Hoisted require import because it hurts my eyes. I always see them call without nesting

Turns out the page https://publicindex.sccourts.org/anderson/publicindex/ make a request to https://publicindex.sccourts.org/

However this request returns a 302 Redirect to https://www.sccourts.org/caseSearch/ location, so the browser acts accordingly

I would try to investigate this weird request if it is legit or not and why it redirects on chrome puppeteer

This post might help, there could be something related on chromium being seen as insecure

I also tried to pass args: ['--disable-web-security', '--allow-running-insecure-content'] to launch() object parameter, but without results

Please let us know how it goes! Har has been fun to discover!

这篇关于puppeteer 被重定向时浏览器不是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆