将 UTF-8 BOM 添加到字符串/Blob [英] Adding UTF-8 BOM to string/Blob

查看:40
本文介绍了将 UTF-8 BOM 添加到字符串/Blob的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在客户端为生成的文本数据添加一个 UTF-8 字节顺序标记.我该怎么做?

当然,使用 new Blob(['xEFxBBxBF' + content]) 会产生 '"my data"'.>

'uBBEFx22BF' 都没有工作('x22' == '"'content 中的下一个字符).

是否可以将 JavaScript 中的 UTF-8 BOM 添加到生成的文本中?

是的,在这种情况下我确实需要 UTF-8 BOM.

解决方案

在字符串前添加 ufeff.请参阅 http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx

参见 @jeff-fischer@casey 有关 UTF-8 和 UTF-16 的详细信息 以及 BOM.使上述工作真正起作用的是字符串 ufeff 始终用于表示 BOM,无论使用的是 UTF-8 还是 UTF-16.

请参阅 Unicode 标准 5.0,第 2 章中的第 36 页的详细解释.来自该页面的引用

<块引用>

表 2-4 中 UTF-8 的字节序条目标记为 N/A,因为UTF-8 编码单元的大小为 8 位,通常的机器问题较大代码单元的字节序不适用.序列化的顺序字节数不得偏离 UTF-8 定义的顺序编码形式.既不要求也不推荐使用 BOMUTF-8,但在使用 UTF-8 数据的上下文中可能会遇到从使用 BOM 或 BOM 所在位置的其他编码形式转换而来用作 UTF-8 签名.

I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that?

Using new Blob(['xEFxBBxBF' + content]) yields '"my data"', of course.

Neither did 'uBBEFx22BF' work (with 'x22' == '"' being the next character in content).

Is it possible to prepend the UTF-8 BOM in JavaScript to a generated text?

Yes, I really do need the UTF-8 BOM in this case.

解决方案

Prepend ufeff to the string. See http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx

See discussion between @jeff-fischer and @casey for details on UTF-8 and UTF-16 and the BOM. What actually makes the above work is that the string ufeff is always used to represent the BOM, regardless of UTF-8 or UTF-16 being used.

See p.36 in The Unicode Standard 5.0, Chapter 2 for a detailed explanation. A quote from that page

The endian order entry for UTF-8 in Table 2-4 is marked N/A because UTF-8 code units are 8 bits in size, and the usual machine issues of endian order for larger code units do not apply. The serialized order of the bytes must not depart from the order defined by the UTF- 8 encoding form. Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature.

这篇关于将 UTF-8 BOM 添加到字符串/Blob的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆