HTML5 文件 API 读取为文本和二进制 [英] HTML5 File API read as text and binary
问题描述
我目前正在研究 HTML5 文件 API,我需要获取二进制文件数据.FileReader
的 readAsText
和 readAsDataURL
方法工作正常,但 readAsBinaryString
返回与 相同的数据readAsText
.
I am currently working on the HTML5 File API, and I need to get binary file data.
The FileReader
's readAsText
, and readAsDataURL
methods work fine, but readAsBinaryString
returns the same data as readAsText
.
我需要二进制数据,但我得到了一个文本字符串.我错过了什么吗?
I need binary data, but im getting a text string. Am I missing something?
推荐答案
2018 年注意:readAsBinaryString
已过时.对于以前使用过的用例,现在您将使用 readAsArrayBuffer
(或在某些情况下,readAsDataURL
) 代替.
Note in 2018: readAsBinaryString
is outdated. For use cases where previously you'd have used it, these days you'd use readAsArrayBuffer
(or in some cases, readAsDataURL
) instead.
readAsBinaryString
表示数据必须表示为 二进制字符串,其中:
readAsBinaryString
says that the data must be represented as a binary string, where:
...每个字节都由一个 [0..255] 范围内的整数表示.
...every byte is represented by an integer in the range [0..255].
JavaScript 最初没有二进制"类型(直到 ECMAScript 5 的 WebGL 支持 Typed Array* (详情如下) -- 它已被 ECMAScript 2015 的 ArrayBuffer),所以他们使用了一个字符串,并保证字符串中存储的任何字符都不会超出 0..255 范围.(他们本来可以使用数字数组代替,但他们没有;也许大字符串比大数字数组更节省内存,因为数字是浮点数.)
JavaScript originally didn't have a "binary" type (until ECMAScript 5's WebGL support of Typed Array* (details below) -- it has been superseded by ECMAScript 2015's ArrayBuffer) and so they went with a String with the guarantee that no character stored in the String would be outside the range 0..255. (They could have gone with an array of Numbers instead, but they didn't; perhaps large Strings are more memory-efficient than large arrays of Numbers, since Numbers are floating-point.)
如果您正在阅读的文件主要是西方文字(例如,主要是英文)的文本,那么该字符串将看起来像文本很多.如果您阅读包含 Unicode 字符的文件,您应该注意到其中的区别,因为 JavaScript 字符串是 UTF-16** (详情见下文) 等一些字符的值会大于 255,而根据 File API 规范的二进制字符串"不会有任何大于 255 的值255(对于 Unicode 代码点的两个字节,您将有两个单独的字符").
If you're reading a file that's mostly text in a western script (mostly English, for instance), then that string is going to look a lot like text. If you read a file with Unicode characters in it, you should notice a difference, since JavaScript strings are UTF-16** (details below) and so some characters will have values above 255, whereas a "binary string" according to the File API spec wouldn't have any values above 255 (you'd have two individual "characters" for the two bytes of the Unicode code point).
如果您正在读取一个根本不是文本的文件(可能是图像),您可能仍然会在 readAsText
和 readAsBinaryString
之间得到非常相似的结果,但是使用 readAsBinaryString
,您知道不会尝试将多字节序列解释为字符.你不知道如果你使用 readAsText
,因为 readAsText
将使用 编码决定 尝试找出文件的编码是什么,然后将其映射到 JavaScript 的 UTF-16 字符串.
If you're reading a file that's not text at all (an image, perhaps), you'll probably still get a very similar result between readAsText
and readAsBinaryString
, but with readAsBinaryString
you know that there won't be any attempt to interpret multi-byte sequences as characters. You don't know that if you use readAsText
, because readAsText
will use an encoding determination to try to figure out what the file's encoding is and then map it to JavaScript's UTF-16 strings.
如果您创建一个文件并将其存储在 ASCII 或 UTF-8 以外的其他格式中,您就可以看到效果.(在 Windows 中,您可以通过记事本执行此操作;另存为"作为带有Unicode"的编码下拉列表,通过查看数据,它们似乎意味着 UTF-16;我确定 Mac OS 和 *nix 编辑器也有类似的功能.)这里有一个页面可以转储双向读取文件的结果:
You can see the effect if you create a file and store it in something other than ASCII or UTF-8. (In Windows you can do this via Notepad; the "Save As" as an encoding drop-down with "Unicode" on it, by which looking at the data they seem to mean UTF-16; I'm sure Mac OS and *nix editors have a similar feature.) Here's a page that dumps the result of reading a file both ways:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Show File Data</title>
<style type='text/css'>
body {
font-family: sans-serif;
}
</style>
<script type='text/javascript'>
function loadFile() {
var input, file, fr;
if (typeof window.FileReader !== 'function') {
bodyAppend("p", "The file API isn't supported on this browser yet.");
return;
}
input = document.getElementById('fileinput');
if (!input) {
bodyAppend("p", "Um, couldn't find the fileinput element.");
}
else if (!input.files) {
bodyAppend("p", "This browser doesn't seem to support the `files` property of file inputs.");
}
else if (!input.files[0]) {
bodyAppend("p", "Please select a file before clicking 'Load'");
}
else {
file = input.files[0];
fr = new FileReader();
fr.onload = receivedText;
fr.readAsText(file);
}
function receivedText() {
showResult(fr, "Text");
fr = new FileReader();
fr.onload = receivedBinary;
fr.readAsBinaryString(file);
}
function receivedBinary() {
showResult(fr, "Binary");
}
}
function showResult(fr, label) {
var markup, result, n, aByte, byteStr;
markup = [];
result = fr.result;
for (n = 0; n < result.length; ++n) {
aByte = result.charCodeAt(n);
byteStr = aByte.toString(16);
if (byteStr.length < 2) {
byteStr = "0" + byteStr;
}
markup.push(byteStr);
}
bodyAppend("p", label + " (" + result.length + "):");
bodyAppend("pre", markup.join(" "));
}
function bodyAppend(tagName, innerHTML) {
var elm;
elm = document.createElement(tagName);
elm.innerHTML = innerHTML;
document.body.appendChild(elm);
}
</script>
</head>
<body>
<form action='#' onsubmit="return false;">
<input type='file' id='fileinput'>
<input type='button' id='btnLoad' value='Load' onclick='loadFile();'>
</form>
</body>
</html>
如果我将它与以 UTF-16 存储的Testing 1 2 3"文件一起使用,我得到的结果如下:
If I use that with a "Testing 1 2 3" file stored in UTF-16, here are the results I get:
Text (13):
54 65 73 74 69 6e 67 20 31 20 32 20 33
Binary (28):
ff fe 54 00 65 00 73 00 74 00 69 00 6e 00 67 00 20 00 31 00 20 00 32 00 20 00 33 00
如您所见,readAsText
解释了字符,所以我得到了 13(Testing 1 2 3"的长度),而 readAsBinaryString
没有,并且所以我得到了 28(两个字节的 BOM 加上每个字符的两个字节).
As you can see, readAsText
interpreted the characters and so I got 13 (the length of "Testing 1 2 3"), and readAsBinaryString
didn't, and so I got 28 (the two-byte BOM plus two bytes for each character).
* XMLHttpRequest.response 与 responseType = "arraybuffer"
HTML 5 支持.
* XMLHttpRequest.response with responseType = "arraybuffer"
is supported in HTML 5.
** JavaScript 字符串是 UTF-16" 可能看起来很奇怪;他们不只是Unicode吗?不,JavaScript 字符串是一系列 UTF-16 代码单元一>;您将代理对视为两个单独的 JavaScript字符",尽管实际上,代理对作为一个整体只是一个字符.详情见链接.
** "JavaScript strings are UTF-16" may seem like an odd statement; aren't they just Unicode? No, a JavaScript string is a series of UTF-16 code units; you see surrogate pairs as two individual JavaScript "characters" even though, in fact, the surrogate pair as a whole is just one character. See the link for details.
这篇关于HTML5 文件 API 读取为文本和二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!