在Chrome中加载页面时,如何捕获所有网络请求和完整的响应数据? [英] How can I capture all network requests and full response data when loading a page in Chrome?
问题描述
我要使用Puppeteer在Chrome中加载一个网址并捕获以下信息:
Using Puppeteer, I'd like to load a URL in Chrome and capture the following information:
- 请求网址
- 请求标头
- 请求发布数据
- 响应标题文本(包括重复的标题,如
set-cookie
) - 已传输的响应大小(即压缩后的大小)
- 完整的反应身体
- request URL
- request headers
- request post data
- response headers text (including duplicate headers like
set-cookie
) - transferred response size (i.e. compressed size)
- full response body
捕获整个响应主体是造成我问题的原因.
Capturing the full response body is what causes the problems for me.
我尝试过的事情:
- 通过
response.buffer
获取响应内容-如果在任何时候都存在重定向,这将不起作用,因为在导航中擦除了缓冲区 - 拦截请求并使用
getResponseBodyForInterception
-这意味着我可以
- Getting response content with
response.buffer
- this does not work if there are redirects at any point, since buffers are wiped on navigation - intercepting requests and using
getResponseBodyForInterception
- this means I can no longer access the encodedLength, and I also had problems getting the correct request and response headers in some cases - Using a local proxy works, but this slowed down page load times significantly (and also changed some behavior for e.g. certificate errors)
理想情况下,该解决方案对性能的影响应该很小,并且与正常加载页面没有功能上的区别.我还要避免分叉Chrome.
Ideally the solution should only have a minor performance impact and have no functional differences from loading a page normally. I would also like to avoid forking Chrome.
推荐答案
您可以使用 page.on('request')
,您可以使用 request-promise-native
模块,以充当中间人来收集响应数据,然后继续通过
You can enable a request interception with page.setRequestInterception()
for each request, and then, inside page.on('request')
, you can use the request-promise-native
module to act as a middle man to gather the response data before continuing the request with request.continue()
in Puppeteer.
这是一个完整的示例:
'use strict';
const puppeteer = require('puppeteer');
const request_client = require('request-promise-native');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const result = [];
await page.setRequestInterception(true);
page.on('request', request => {
request_client({
uri: request.url(),
resolveWithFullResponse: true,
}).then(response => {
const request_url = request.url();
const request_headers = request.headers();
const request_post_data = request.postData();
const response_headers = response.headers;
const response_size = response_headers['content-length'];
const response_body = response.body;
result.push({
request_url,
request_headers,
request_post_data,
response_headers,
response_size,
response_body,
});
console.log(result);
request.continue();
}).catch(error => {
console.error(error);
request.abort();
});
});
await page.goto('https://example.com/', {
waitUntil: 'networkidle0',
});
await browser.close();
})();
这篇关于在Chrome中加载页面时,如何捕获所有网络请求和完整的响应数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!