CSV文件的BLOB字符集 [英] Blob charset for CSV file

查看：44 发布时间：2022/3/3 21:43:18 javascript character-encoding blob

本文介绍了CSV文件的BLOB字符集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用BLOB创建一个CSV文件。该文件应以ANSI编码，但它不起作用。

var blob = new Blob(["ufeff", csvFile], { type: 'text/csv;charset=windows-1252;' });

文件始终使用UTF-8编码创建。

推荐答案

向Blob's constructor传递USVString(或JavaScript字符串)将在Blob的数据中automatically encode it to UTF-8。

type选项仅供资源抓取器使用，用于模拟HTTP请求的Content-Type头。
因此，例如，如果您通过blob://URI获取Blob或提供Blob，则将使用此type值，类似地，如果在没有第二个参数的情况下调用FileReader的readAsText( blob )方法，则可能会使用那里的charset=信息。

，但是此type选项根本不会更改Blob数据的内容。

(async ()=> {

  const data = "é";
  const no_type = new Blob( [ data ] );
  const csv_windows1252 = new Blob( [ data ], { type: "text/csv;charset=Windows-1252" } );
  const image_png = new Blob( [ data ], { type: "image/png" } );

  // read as ArrayBuffer to see the exact binary content
  console.log( "no_type:", await hexDump( no_type ) ); // C3A9
  console.log( "csv_windows1252:", await hexDump( csv_windows1252 ) ); // C3A9
  console.log( "image_png:", await hexDump( image_png ) ); // C3A9

})();

async function hexDump( blob ) {
  const buf = await blob.arrayBuffer();
  const view = new Uint8Array( buf );
  const arr = [ ... view ];
  return arr.map( (val) => val.toString( 16 ) )
    .join( "" ).toUpperCase();
}

如您在此代码片断中所见，无论type参数如何，所有这些Blob都持有完全相同的字节数据：与UTF-8(祸不单行)0xC3 0xA9(C3a9)表示的字符é (U+00e9)UTF-8相对应。
在ANSI (Windows-1252)中，此字符由字节0xe9(E9)表示，因此，如果Blob包含以ANSI编码的文本，则它应该包含此字节。

查看它的一种方法是使用TextDecoder并尝试使用两种编码对两个Blob进行解码：

const UTF8Content = new Uint8Array( [ 0xC3, 0xA9 ] );
const ANSIContent = new Uint8Array( [ 0xE9 ] );

const UTF8Decoder = new TextDecoder( "utf-8" );
const ANSIDecoder = new TextDecoder( "windows-1252" );

console.log( "UTF8-content decoded as UTF8",
  UTF8Decoder.decode( UTF8Content )
); // é
console.log( "UTF8-content decoded as ANSI",
  ANSIDecoder.decode( UTF8Content )
); // Ã©
console.log( "ANSI-content decoded as UTF8",
  UTF8Decoder.decode( ANSIContent )
); // �
console.log( "ANSI-content decoded as ANSI",
  ANSIDecoder.decode( ANSIContent )
); // é

因此，为了满足您的需要，您需要从包含已用ANSI编码的数据的TypedArray生成BLOB。
过去有一个使用TextEncoder API将USVString编码为任意编码的选项，但该选项已从规范和浏览器中删除。

因此，最简单的方法是使用库来执行转换。在这里，我将使用this one：

const text = "é";
const data = new TextEncoder( "windows-1252", {
  NONSTANDARD_allowLegacyEncoding: true
} ).encode( text ); // now `data` is an Uint8Array

const blob = new Blob( [ "foo bar" ], { type: "text/csv" } ); // here you have your ANSI Blob

// Just to be sure
hexDump( blob ).then( console.log ); // E9

async function hexDump( blob ) {
  const buf = await blob.arrayBuffer();
  const view = new Uint8Array( buf );
  const arr = [ ...view ];
  return arr.map( (val) => val.toString( 16 ) )
    .join( "" ).toUpperCase();
}

<script>
  // we need to force installation of the library
  // by removing the built-in API
  window.TextEncoder = null;
</script>
<script src="https://cdn.jsdelivr.net/gh/inexorabletash/text-encoding/lib/encoding-indexes.js"></script>
<script src="https://cdn.jsdelivr.net/gh/inexorabletash/text-encoding/lib/encoding.js"></script>

As a fiddle with the download link，因为堆栈代码段不再允许这样做。

重要说明：

ANSI仅支持有限的字符集，USVString中可以包含的某些字符无法映射到ANSI，因此您必须确保您的输入将只包含可映射的字符，否则将引发：

const text = "😱"; // can't be mapped to ANSI
const data = new TextEncoder( "windows-1252", {
  NONSTANDARD_allowLegacyEncoding: true
} ).encode( text ); // throws

<script>
  window.TextEncoder = null;
</script>
<script src="https://cdn.jsdelivr.net/gh/inexorabletash/text-encoding/lib/encoding-indexes.js"></script>
<script src="https://cdn.jsdelivr.net/gh/inexorabletash/text-encoding/lib/encoding.js"></script>

ps：您在代码中添加到Blob数据前面的uFFFE字符是UTF-16BOM。它只帮助读者了解UTF-16编码文本数据的预期字符顺序，不会以任何方式对以下数据进行编码，对非UTF-16文件也没有任何帮助。

这篇关于CSV文件的BLOB字符集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CSV文件的BLOB字符集 [英] Blob charset for CSV file

问题描述

推荐答案

重要说明：

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

CSV文件的BLOB字符集 [英] Blob charset for CSV file

问题描述

推荐答案

重要说明：

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭