axios如何处理blob与arraybuffer作为responseType? [英] how does axios handle blob vs arraybuffer as responseType?
问题描述
我正在使用 axios 下载一个zip文件.为了进行进一步处理,我需要获取已下载的原始"数据.据我所知,在Javascript中有两种类型:Blob和Arraybuffers.两者都可以在请求选项中指定为 responseType
.
I'm downloading a zip file with axios. For further processing, I need to get the "raw" data that has been downloaded. As far as I can see, in Javascript there are two types for this: Blobs and Arraybuffers. Both can be specified as responseType
in the request options.
下一步,需要解压缩zip文件.我为此尝试了两个库:js-zip和adm-zip.两者都希望数据是一个ArrayBuffer.到目前为止,我可以将Blob转换为缓冲区.转换之后,adm-zip总是很高兴提取zip文件.但是,除非已使用'arraybuffer'
作为axios responseType
下载了zip,否则js-zip会抱怨文件已损坏.js-zip在从缓冲区
上不起作用.
In a next step, the zip file needs to be uncompressed. I've tried two libraries for this: js-zip and adm-zip. Both want the data to be an ArrayBuffer. So far so good, I can convert the blob to a buffer. And after this conversion adm-zip always happily extracts the zip file. However, js-zip complains about a corrupted file, unless the zip has been downloaded with 'arraybuffer'
as the axios responseType
. js-zip does not work on a buffer
that has been taken from a blob
.
这让我很困惑.我认为 ArrayBuffer
和 Blob
本质上只是底层内存上的视图.在下载内容作为Blob与缓冲区之间的性能可能有所不同.但是结果数据应该相同,对吧?
This was very confusing to me. I thought both ArrayBuffer
and Blob
are essentially just views on the underlying memory. There might be a difference in performance between downloading something as a blob vs buffer. But the resulting data should be the same, right ?
好吧,我决定尝试一下,发现了这一点:
Well, I decided to experiment and found this:
如果指定 responseType:'blob'
,则axios将 response.data
转换为字符串.假设您对该字符串进行哈希处理并获得哈希码A.然后将其转换为缓冲区.对于此转换,您需要指定一种编码.根据编码的不同,您会得到各种新的哈希,我们称它们为B1,B2,B3,....在将'utf8'指定为编码时,我会回到原始的哈希A.
If you specify responseType: 'blob'
, axios converts the response.data
to a string. Let's say you hash this string and get hashcode A. Then you convert it to a buffer. For this conversion, you need to specify an encoding. Depending on the encoding, you will get a variety of new hashes, let's call them B1, B2, B3, ... When specifying 'utf8' as the encoding, I get back to the original hash A.
所以我想当以'blob'
的形式下载数据时,axios会将其隐式转换为使用utf8编码的字符串.这似乎很合理.
So I guess when downloading data as a 'blob'
, axios implicitly converts it to a string encoded with utf8. This seems very reasonable.
现在,您指定 responseType:'arraybuffer'
.Axios为您提供了一个缓冲区,作为 response.data
.哈希缓冲区,您将获得哈希码C.此代码与A,B1,B2,...中的任何代码都不对应.
Now you specify responseType: 'arraybuffer'
. Axios provides you with a buffer as response.data
. Hash the buffer and you get a hashcode C. This code does not correspond to any code in A, B1, B2, ...
因此,当将数据下载为'arraybuffer'
时,您会获得完全不同的数据吗?
So when downloading data as an 'arraybuffer'
, you get entirely different data?
现在对我来说,解压缩库js-zip抱怨如果数据以'blob'
的形式下载.它实际上可能以某种方式损坏了.但是,adm-zip如何提取它呢?而且我检查了提取的数据,它是正确的.这种特定的zip存档可能只有这种情况,但令我惊讶的是.
It now makes sense to me that the unzipping library js-zip complains if the data is downloaded as a 'blob'
. It probably actually is corrupted somehow. But then how is adm-zip able to extract it? And I checked the extracted data, it is correct. This might only be the case for this specific zip archive, but nevertheless surprises me.
这是我用于实验的示例代码:
Here is the sample code I used for my experiments:
//typescript import syntax, this is executed in nodejs
import axios from 'axios';
import * as crypto from 'crypto';
axios.get(
"http://localhost:5000/folder.zip", //hosted with serve
{ responseType: 'blob' }) // replace this with 'arraybuffer' and response.data will be a buffer
.then((response) => {
console.log(typeof (response.data));
// first hash the response itself
console.log(crypto.createHash('md5').update(response.data).digest('hex'));
// then convert to a buffer and hash again
// replace 'binary' with any valid encoding name
let buffer = Buffer.from(response.data, 'binary');
console.log(crypto.createHash('md5').update(buffer).digest('hex'));
//...
什么在这里产生差异,如何获得真实"下载的数据?
What creates the difference here, and how do I get the 'true' downloaded data?
推荐答案
来自 axios文档:
// `responseType` indicates the type of data that the server will respond with
// options are: 'arraybuffer', 'document', 'json', 'text', 'stream'
// browser only: 'blob'
responseType: 'json', // default
'blob'
是仅限浏览器",选项.
因此,从node.js中,当您设置 responseType:"blob"
时,实际上将使用"json"
,我认为这是对的后备未获取可解析的JSON数据时为文本"
.
'blob'
is a "browser only" option.
So from node.js, when you set responseType: "blob"
, "json"
will actually be used, which I guess fallbacks to "text"
when no parse-able JSON data has been fetched.
以文本形式提取二进制数据很容易生成损坏的数据.因为 Body.text()和许多其他API是 USVStrings (它们不允许未配对的代理代码点),并且由于响应被解码为UTF-8,因此二进制文件中的某些字节无法正确映射为字符,因此将被替换为(U + FFDD)替换字符,无法取回该字符之前的数据:您的数据已损坏.
Fetching binary data as text is prone to generate corrupted data. Because the text returned by Body.text() and many other APIs are USVStrings (they don't allow unpaired surrogate codepoints ) and because the response is decoded as UTF-8, some bytes from the binary file can't be mapped to characters correctly and will thus be replaced by � (U+FFDD) replacement character, with no way to get back what that data was before: your data is corrupted.
以下是一个摘要片段,以.png文件的标题 0x89 0x50 0x4E 0x47
为例.
Here is a snippet explaining this, using the header of a .png file 0x89 0x50 0x4E 0x47
as an example.
(async () => {
const url = 'https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png';
// fetch as binary
const buffer = await fetch( url ).then(resp => resp.arrayBuffer());
const header = new Uint8Array( buffer ).slice( 0, 4 );
console.log( 'binary header', header ); // [ 137, 80, 78, 61 ]
console.log( 'entity encoded', entityEncode( header ) );
// [ "U+0089", "U+0050", "U+004E", "U+0047" ]
// You can read more about (U+0089) character here
// https://www.fileformat.info/info/unicode/char/0089/index.htm
// You can see in the left table how this character in UTF-8 needs two bytes (0xC2 0x89)
// We thus can't map this character correctly in UTF-8 from the UTF-16 codePoint,
// it will get discarded by the parser and converted to the replacement character
// read as UTF-8
const utf8_str = await new Blob( [ header ] ).text();
console.log( 'read as UTF-8', utf8_str ); // "�PNG"
// build back a binary array from that string
const utf8_binary = [ ...utf8_str ].map( char => char.charCodeAt( 0 ) );
console.log( 'Which is binary', utf8_binary ); // [ 65533, 80, 78, 61 ]
console.log( 'entity encoded', entityEncode( utf8_binary ) );
// [ "U+FFDD", "U+0050", "U+004E", "U+0047" ]
// You can read more about character � (U+FFDD) here
// https://www.fileformat.info/info/unicode/char/0fffd/index.htm
//
// P (U+0050), N (U+004E) and G (U+0047) characters are compatible between UTF-8 and UTF-16
// For these there is no encoding lost
// (that's how base64 encoding makes it possible to send binary data as text)
// now let's see what fetching as text holds
const fetched_as_text = await fetch( url ).then( resp => resp.text() );
const header_as_text = fetched_as_text.slice( 0, 4 );
console.log( 'fetched as "text"', header_as_text ); // "�PNG"
const as_text_binary = [ ...header_as_text ].map( char => char.charCodeAt( 0 ) );
console.log( 'Which is binary', as_text_binary ); // [ 65533, 80, 78, 61 ]
console.log( 'entity encoded', entityEncode( as_text_binary ) );
// [ "U+FFDD", "U+0050", "U+004E", "U+0047" ]
// It's been read as UTF-8, we lost the first byte.
})();
function entityEncode( arr ) {
return Array.from( arr ).map( val => 'U+' + toHex( val ) );
}
function toHex( num ) {
return num.toString( 16 ).padStart(4, '0').toUpperCase();
}
node.js中本来就没有Blob对象,因此axios并没有猴子补丁只是为了使它们可以返回响应,而其他任何人都无法使用它.
There is natively no Blob object in node.js, so it makes sense axios didn't monkey-patch it just so they can return a response no-one else would be able to consume anyway.
在浏览器中,您将获得完全相同的响应:
From a browser, you'd have exactly the same responses:
function fetchAs( type ) {
return axios( {
method: 'get',
url: 'https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png',
responseType: type
} );
}
function loadImage( data, type ) {
// we can all pass them to the Blob constructor directly
const new_blob = new Blob( [ data ], { type: 'image/jpg' } );
// with blob: URI, the browser will try to load 'data' as-is
const url = URL.createObjectURL( new_blob );
img = document.getElementById( type + '_img' );
img.src = url;
return new Promise( (res, rej) => {
img.onload = e => res(img);
img.onerror = rej;
} );
}
[
'json', // will fail
'text', // will fail
'arraybuffer',
'blob'
].forEach( type =>
fetchAs( type )
.then( resp => loadImage( resp.data, type ) )
.then( img => console.log( type, 'loaded' ) )
.catch( err => console.error( type, 'failed' ) )
);
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
<figure>
<figcaption>json</figcaption>
<img id="json_img">
</figure>
<figure>
<figcaption>text</figcaption>
<img id="text_img">
</figure>
<figure>
<figcaption>arraybuffer</figcaption>
<img id="arraybuffer_img">
</figure>
<figure>
<figcaption>blob</figcaption>
<img id="blob_img">
</figure>
这篇关于axios如何处理blob与arraybuffer作为responseType?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!