如何通过puppeteer获取嵌入在页面中的pdf? [英] How to obtain a pdf embedded in page through puppeteer?
问题描述
我正在尝试获取结构如下所示的页面的 pdf 副本:
I am trying to obtain a pdf copy of a page whose structure is like so:
<body style="background-color: rgb(38,38,38); height: 100%; width: 100%; overflow: hidden; margin: 0">
<embed width="100%" height="100%" name="plugin" id="plugin" src="https://www.thesourceurl.com" type="application/pdf" internalinstanceid="7" title="">
</body>
我尝试使用 page.pdf
获取它,但我得到了一个空白的 pdf,中间写着无法加载插件".
I tried getting it with page.pdf
but I got a blank pdf with "Couldn't load plugin" written in the middle.
推荐答案
对于任何偶然发现这个问题的人,
For anyone else who stumbled upon this question,
在撰写本文时,这是 Chromium 中的一个已知错误,您无法在 headless:true
模式下导航到 pdf 或嵌入了 pdf 的页面.
At the time of writing this, it's a known bug in chromium where you are unable to navigate to a pdf or a page embedded with pdf in headless:true
mode.
我在此处找到了一个临时解决方案,尽管您必须知道您可以事先获取 pdf 的网址.
I found a temporary solution to this here, though you have to know the url where you will obtain the pdf beforehand.
page.exposeFunction("writeABString", async (strbuf, targetFile) => {
var str2ab = function _str2ab(str) { // Convert a UTF-8 String to an ArrayBuffer
var buf = new ArrayBuffer(str.length); // 1 byte for each char
var bufView = new Uint8Array(buf);
for (var i=0, strLen=str.length; i < strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
console.log("In 'writeABString' function...");
return new Promise((resolve, reject) => {
// Convert the ArrayBuffer string back to an ArrayBufffer, which in turn is converted to a Buffer
let buf = Buffer.from(str2ab(strbuf));
// Try saving the file.
fs.writeFile(targetFile, buf, (err, text) => {
if(err) reject(err);
else resolve(targetFile);
});
});
});
在上一页中,您必须使用评估调用获取 pdf 并获取 api 以最初获取缓冲区响应并进行相同的转换:
At the previous page where you have to get the pdf from use an evaluate call and fetch api to initially get the buffer response and convert the same:
page.evaluate( async () => {
function arrayBufferToString(buffer){ // Convert an ArrayBuffer to an UTF-8 String
var bufView = new Uint8Array(buffer);
var length = bufView.length;
var result = '';
var addition = Math.pow(2,8)-1;
for(var i = 0;i<length;i+=addition){
if(i + addition > length){
addition = length - i;
}
result += String.fromCharCode.apply(null, bufView.subarray(i,i+addition));
}
return result;
}
let geturl = "https://whateverurl.example.com";
return fetch(geturl, {
credentials: 'same-origin', // usefull when we are logged into a website and want to send cookies
responseType: 'arraybuffer', // get response as an ArrayBuffer
})
.then(response => response.arrayBuffer())
.then( arrayBuffer => {
var bufstring = arrayBufferToString(arrayBuffer);
return window.writeABString(bufstring, '/tmp/downloadtest.pdf');
})
.catch(function (error) {
console.log('Request failed: ', error);
});
});
这篇关于如何通过puppeteer获取嵌入在页面中的pdf?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!