HTML5 文件 API 读取为文本和二进制 [英] HTML5 File API read as text and binary

查看:27
本文介绍了HTML5 文件 API 读取为文本和二进制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究 HTML5 文件 API,我需要获取二进制文件数据.FileReaderreadAsTextreadAsDataURL 方法工作正常,但 readAsBinaryString 返回与 相同的数据readAsText.

I am currently working on the HTML5 File API, and I need to get binary file data. The FileReader's readAsText, and readAsDataURL methods work fine, but readAsBinaryString returns the same data as readAsText.

我需要二进制数据,但我得到了一个文本字符串.我错过了什么吗?

I need binary data, but im getting a text string. Am I missing something?

推荐答案

2018 年注意:readAsBinaryString 已过时.对于以前使用过的用例,现在您将使用 readAsArrayBuffer(或在某些情况下,readAsDataURL) 代替.

Note in 2018: readAsBinaryString is outdated. For use cases where previously you'd have used it, these days you'd use readAsArrayBuffer (or in some cases, readAsDataURL) instead.

readAsBinaryString 表示数据必须表示为 二进制字符串,其中:

readAsBinaryString says that the data must be represented as a binary string, where:

...每个字节都由一个 [0..255] 范围内的整数表示.

...every byte is represented by an integer in the range [0..255].

JavaScript 最初没有二进制"类型(直到 ECMAScript 5 的 WebGL 支持 Typed Array* (详情如下) -- 它已被 ECMAScript 2015 的 ArrayBuffer),所以他们使用了一个字符串,并保证字符串中存储的任何字符都不会超出 0..255 范围.(他们本来可以使用数字数组代替,但他们没有;也许大字符串比大数字数组更节省内存,因为数字是浮点数.)

JavaScript originally didn't have a "binary" type (until ECMAScript 5's WebGL support of Typed Array* (details below) -- it has been superseded by ECMAScript 2015's ArrayBuffer) and so they went with a String with the guarantee that no character stored in the String would be outside the range 0..255. (They could have gone with an array of Numbers instead, but they didn't; perhaps large Strings are more memory-efficient than large arrays of Numbers, since Numbers are floating-point.)

如果您正在阅读的文件主要是西方文字(例如,主要是英文)的文本,那么该字符串将看起来像文本很多.如果您阅读包含 Unicode 字符的文件,您应该注意到其中的区别,因为 JavaScript 字符串是 UTF-16** (详情见下文) 等一些字符的值会大于 255,而根据 File API 规范的二进制字符串"不会有任何大于 255 的值255(对于 Unicode 代码点的两个字节,您将有两个单独的字符").

If you're reading a file that's mostly text in a western script (mostly English, for instance), then that string is going to look a lot like text. If you read a file with Unicode characters in it, you should notice a difference, since JavaScript strings are UTF-16** (details below) and so some characters will have values above 255, whereas a "binary string" according to the File API spec wouldn't have any values above 255 (you'd have two individual "characters" for the two bytes of the Unicode code point).

如果您正在读取一个根本不是文本的文件(可能是图像),您可能仍然会在 readAsTextreadAsBinaryString 之间得到非常相似的结果,但是使用 readAsBinaryString,您知道不会尝试将多字节序列解释为字符.你不知道如果你使用 readAsText,因为 readAsText 将使用 编码决定 尝试找出文件的编码是什么,然后将其映射到 JavaScript 的 UTF-16 字符串.

If you're reading a file that's not text at all (an image, perhaps), you'll probably still get a very similar result between readAsText and readAsBinaryString, but with readAsBinaryString you know that there won't be any attempt to interpret multi-byte sequences as characters. You don't know that if you use readAsText, because readAsText will use an encoding determination to try to figure out what the file's encoding is and then map it to JavaScript's UTF-16 strings.

如果您创建一个文件并将其存储在 ASCII 或 UTF-8 以外的其他格式中,您就可以看到效果.(在 Windows 中,您可以通过记事本执行此操作;另存为"作为带有Unicode"的编码下拉列表,通过查看数据,它们似乎意味着 UTF-16;我确定 Mac OS 和 *nix 编辑器也有类似的功能.)这里有一个页面可以转储双向读取文件的结果:

You can see the effect if you create a file and store it in something other than ASCII or UTF-8. (In Windows you can do this via Notepad; the "Save As" as an encoding drop-down with "Unicode" on it, by which looking at the data they seem to mean UTF-16; I'm sure Mac OS and *nix editors have a similar feature.) Here's a page that dumps the result of reading a file both ways:

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Show File Data</title>
<style type='text/css'>
body {
    font-family: sans-serif;
}
</style>
<script type='text/javascript'>

    function loadFile() {
        var input, file, fr;

        if (typeof window.FileReader !== 'function') {
            bodyAppend("p", "The file API isn't supported on this browser yet.");
            return;
        }

        input = document.getElementById('fileinput');
        if (!input) {
            bodyAppend("p", "Um, couldn't find the fileinput element.");
        }
        else if (!input.files) {
            bodyAppend("p", "This browser doesn't seem to support the `files` property of file inputs.");
        }
        else if (!input.files[0]) {
            bodyAppend("p", "Please select a file before clicking 'Load'");
        }
        else {
            file = input.files[0];
            fr = new FileReader();
            fr.onload = receivedText;
            fr.readAsText(file);
        }

        function receivedText() {
            showResult(fr, "Text");

            fr = new FileReader();
            fr.onload = receivedBinary;
            fr.readAsBinaryString(file);
        }

        function receivedBinary() {
            showResult(fr, "Binary");
        }
    }

    function showResult(fr, label) {
        var markup, result, n, aByte, byteStr;

        markup = [];
        result = fr.result;
        for (n = 0; n < result.length; ++n) {
            aByte = result.charCodeAt(n);
            byteStr = aByte.toString(16);
            if (byteStr.length < 2) {
                byteStr = "0" + byteStr;
            }
            markup.push(byteStr);
        }
        bodyAppend("p", label + " (" + result.length + "):");
        bodyAppend("pre", markup.join(" "));
    }

    function bodyAppend(tagName, innerHTML) {
        var elm;

        elm = document.createElement(tagName);
        elm.innerHTML = innerHTML;
        document.body.appendChild(elm);
    }

</script>
</head>
<body>
<form action='#' onsubmit="return false;">
<input type='file' id='fileinput'>
<input type='button' id='btnLoad' value='Load' onclick='loadFile();'>
</form>
</body>
</html>

如果我将它与以 UTF-16 存储的Testing 1 2 3"文件一起使用,我得到的结果如下:

If I use that with a "Testing 1 2 3" file stored in UTF-16, here are the results I get:

Text (13):

54 65 73 74 69 6e 67 20 31 20 32 20 33

Binary (28):

ff fe 54 00 65 00 73 00 74 00 69 00 6e 00 67 00 20 00 31 00 20 00 32 00 20 00 33 00

如您所见,readAsText 解释了字符,所以我得到了 13(Testing 1 2 3"的长度),而 readAsBinaryString 没有,并且所以我得到了 28(两个字节的 BOM 加上每个字符的两个字节).

As you can see, readAsText interpreted the characters and so I got 13 (the length of "Testing 1 2 3"), and readAsBinaryString didn't, and so I got 28 (the two-byte BOM plus two bytes for each character).

* XMLHttpRequest.responseresponseType = "arraybuffer" HTML 5 支持.

* XMLHttpRequest.response with responseType = "arraybuffer" is supported in HTML 5.

** JavaScript 字符串是 UTF-16" 可能看起来很奇怪;他们不只是Unicode吗?不,JavaScript 字符串是一系列 UTF-16 代码单元;您将代理对视为两个单独的 JavaScript字符",尽管实际上,代理对作为一个整体只是一个字符.详情见链接.

** "JavaScript strings are UTF-16" may seem like an odd statement; aren't they just Unicode? No, a JavaScript string is a series of UTF-16 code units; you see surrogate pairs as two individual JavaScript "characters" even though, in fact, the surrogate pair as a whole is just one character. See the link for details.

这篇关于HTML5 文件 API 读取为文本和二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆