“.NET框架默认使用UTF-16编码标准”是什么?意思? [英] What does "The .NET framework uses the UTF-16 encoding standard by default" mean?

查看:167
本文介绍了“.NET框架默认使用UTF-16编码标准”是什么?意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的学习指南(70-536考试)在文本和编码章节中说了两次,这是在IO章之后。

My study guide (for 70-536 exam) says this twice in the text and encoding chapter, which is right after the IO chapter.

所有的例子如此远程使用FileStream和StreamWriter进行简单的文件访问。

All the examples so far are to do with simple file access using FileStream and StreamWriter.

它aslo说的东西像如果你不知道在创建文件时要使用什么编码,不指定一个,.NET将使用UTF16和使用Stream构造函数重载指定不同的编码。

It aslo says stuff like "If you don't know what encoding to use when you create a file, don't specify one and .NET will use UTF16" and "Specify different encodings using Stream constructor overloads".

没有事实,实际的重载是在StreamWriter类,但是,无论如何。

Never mind the fact that the actual overloads are on the StreamWriter class but hey, whatever.

我是看着StreamWriter现在在反射器,我确信我可以看到默认是actaully UTF8NoBOM。

I am looking at StreamWriter right now in reflector and I am certain I can see that the default is actaully UTF8NoBOM.

但这些都没有列在勘误表中。这是一本旧书(两个版本的错误),所以如果这是错误的,我会以为有人拿起来了.....

But none of this is listed in the errata. It's an old book (cheked the errat of both editions) so if it was wrong I would have thought someone had picked up on it.....

让我想起也许我不明白。

Makes me think maybe I didn't understand it.

所以.....任何想法,它在说什么?还有其他地方有默认的地方?

So.....any ideas what it is talking about? Some other place where there is a default?

只是完全混淆了我。

推荐答案

UTF-16 是一个烦人的术语,因为它有两个容易混淆的含义。

"UTF-16" is an annoying term, as it has two meanings which are easily confused.

第一个含义是一系列16位代码点。大多数这些直接对应于相同数字的Unicode字符;基本多语言平面之外的字符(U + 10000以上)被存储为两个16位代码点,每个代理人

The first meaning is a series of 16-bit codepoints. Most of these correspond directly to the Unicode character of the same number; characters outside the Basic Multilingual Plane (U+10000 upwards) are stored as two 16-bit codepoints, each one of the Surrogates.

在这个意义上,许多语言使用UTF-16进行内部存储,包括本地字符串类型。这是.NET(或Java)使用UTF-16作为其默认编码一词的通常来源。 .NET正在一次访问这样一个UTF-16字符串16位的元素(即在实现级别,作为uint16)。

Many languages use UTF-16 in this sense for internal storage purposes, including as a native string type. This is the usual source of phrases like ".NET (or Java) uses UTF-16 as its default encoding". .NET is accessing the elements of such a UTF-16 string 16 bits at a time (ie, at the implementation level, as a uint16).

接下来的事情考虑将这样的UTF-16字符串编码为线性字节,用于存储在文件或网络流中。像往常一样,将较大的数字存储为字节,有两种可能的编码:little-endian或big-endian。所以你可以使用UTF-16的UTF-16编码,UTF-16的UTF-16LE编码为UTF-16BE,也可以是UTF-16BE的大端编码。

The next thing to consider is the encoding of such a UTF-16 string into linear bytes, for storage in a file or network stream. As always when you store larger numbers into bytes, there are two possible encodings: little-endian or big-endian. So you can use "UTF-16LE", the little-endian encoding of UTF-16 into bytes, or "UTF-16BE", the big-endian encoding.

(UTF-16LE是更常用的,只是为了给火焰增加更多的混乱,Windows给它带来了深刻的误导和歧义的编码名称Unicode,实际上使用UTF-8几乎总是更好的存储文件和网络流比UTF-16LE / BE。)

("UTF-16LE" is the more commonly used. Just to add more confusion to the flames, Windows gives it the deeply misleading and ambiguous encoding name "Unicode". In reality it is almost always better to use UTF-8 for file storage and network streams than either of UTF-16LE/BE.)

但是,如果你不知道一堆字节是否包含UTF-16LE或UTF- 16BE,您可以使用看第一个代码点的技巧来处理它。此代码点字节顺序标记(BOM)仅在读取方式时有效,因此您不能将一个编码错误地另存为。

But if you don't know whether a bunch of bytes contains "UTF-16LE" or "UTF-16BE", you can use the trick of looking at the first code point to work it out. This code point, the Byte Order Mark (BOM), is only valid when read one way around, so you can't mistake one encoding for the other.

此方法,不关心你有什么字节顺序,但使用BOM来发信号,通常是以编码名称UTF-16引用。

This approach, of not caring what byte order you have but using a BOM to signal it, is usually referred to under the encoding name... "UTF-16".

所以当有人说UTF-16时,你不能指出它们是指一串短整型Unicode代码点,还是一系列以非特定顺序排列的字节序列,将解码为一个。

So, when someone says "UTF-16", you can't tell whether they mean a sequence of short-int Unicode code points, or a sequence of bytes in unspecified order that will decode to one.

(UTF-32有同样的问题。)

("UTF-32" has the same problem.)


如果你不知道什么编码在创建文件时使用,不要指定一个,.NET将使用UTF16

If you don't know what encoding to use when you create a file, don't specify one and .NET will use UTF16

如果这是实际的直接引用,那就是谎言。构建一个没有编码参数的StreamWriter 被明确指定给你UTF- 8。

If that's the actual direct quote it is a lie. Constructing a StreamWriter without an encoding argument is explicitly specified to give you UTF-8.

这篇关于“.NET框架默认使用UTF-16编码标准”是什么?意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆