为什么.NET使用字符串的UTF16编码,但使用UTF8作为默认保存文件? [英] Why does .net uses the UTF16 encoding for string , but uses utf8 as default for saving files?
问题描述
从本质上讲,字符串使用UTF-16字符编码表
Essentially, string uses the UTF-16 character encoding form
但节能VS 的StreamWriter 时:
此构造函数使用UTF-8编码使用StreamWriter没有 字节顺序标记(BOM),
This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM),
我已经看到这个样本:
和它看起来像 UTF8
是当某些字符串UTF-16
是在其他一些字符串较小的小。
And it looks like utf8
is smaller for some strings while utf-16
is smaller in some other strings.
- 那么,为什么.NET使用
UTF16
为字符串,而UTF8
保存文件的默认编码?
- So Why .net uses
utf16
as default encoding for string whileutf8
for saving file ?
感谢你。
P.S。香港专业教育学院已经阅读著名的文章 的
推荐答案
如果的你快乐忽略代理对(或者,您的应用程序需要的基本多文种平面之外字符的可能性), UTF-16有一些不错的性能,基本都来自于不断的被每code单元的大小。你知道有多少空间分配为code单位给定的号码,你可以直接索引到该空间访问第n code单元。这些都不是的一般的一个文本文件中的重要环节 - 虽然他们肯定是,如果你想使用随机访问 - 但规模普遍的是的重要的文本文件。
If you're happy ignoring surrogate pairs (or equivalently, the possibility of your app needing characters outside the basic multilingual plane), UTF-16 has some nice properties, basically due to the size per code unit being constant. You know how much space to allocate for a given number of code units, and you can index directly into that space to access the nth code unit. Those aren't usually important aspects for a text file - although they certainly are if you want to use random access - but size generally is important for text files.
考虑原始类型字符
。如果我们使用UTF-8作为内存重新presentation,想应付的所有的统一code字,有多大应该有多大呢?这可能是多达6个字节......这意味着我们总是必须分配6个字节。在这一点上,我们还不如用UTF-32!
Consider the primitive type char
. If we use UTF-8 as the in-memory representation and want to cope with all Unicode characters, how big should that be? It could be up to 6 bytes... which means we'd always have to allocate 6 bytes. At that point we might as well use UTF-32!
当然,我们可以使用UTF-32作为字符
重presentation,但UTF-8在字符串
重presentation,将作为我们走了。
Of course, we could use UTF-32 as the char
representation, but UTF-8 in the string
representation, converting as we go.
如果UTF-16落在当然下来就是每统一code字符code单元的数量是可变的......但我的经验相对较少的应用程序的实际上的处理非-BMP字符正确反正。
Where UTF-16 falls down of course is that the number of code units per Unicode character is variable... but in my experience relatively few apps actually handle non-BMP characters correctly anyway.
(此外,我相信Windows使用UTF-16统一code数据,它是有道理的,.NET跟风互操作的原因。这只是虽然推一万步的问题。)
(Additionally, I believe Windows uses UTF-16 for Unicode data, and it makes sense for .NET to follow suit for interop reasons. That just pushes the question on one step though.)
这篇关于为什么.NET使用字符串的UTF16编码,但使用UTF8作为默认保存文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!