Windows中的Unicode规范化 [英] Unicode Normalization in Windows

查看:186
本文介绍了Windows中的Unicode规范化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在Windows中使用"unicode字符串",直到...了解Unicode(例如 毕业后).但是,总是让我感到困惑的是Win32API非常松散地提到了"unicode".特别是,MSN提到的"unicode"变体是UTF-16(尽管宽字符"术语来自于它以前是UCS-2(不是Unicode)这一事实).但是,它几乎没有提到Unicode规范化.

I've been using "unicode strings" in Windows for as long as... I've learned about Unicode (e.g. after graduating). However, it always mystified me that the Win32API mentions "unicode" very loosely. In particular, "unicode" variant mentioned by MSN is UTF-16 (although the "wide char" terminology comes from the fact that it used to be UCS-2, which is not Unicode). However, it makes almost no mention of Unicode Normalization.

MSN上有几页关于 Unicode Unicode规范化形式和功能来更改规范化形式 .关于规范化的页面甚至说:

MSN has a few pages about Unicode and Unicode Normalization Forms and functions to change the normalization form. The page on normalization even says:

Win32和.NET Framework支持所有四种规范化形式.

Win32 and the .NET Framework support all four normalization forms.

但是,我在文档中的任何地方都没有找到Win32 API使用(或理解)哪种规范化形式.

However, I haven't found anywhere in the docs what normalization form is used (or understood) by the Win32 API.

问题1 :默认情况下,哪种规范化形式用于用户输入(例如Edit控件)以及通过MultiByteToWideChar()进行转换?

Question 1: what normalization form is used by default for user input (such as an Edit control) and conversion through MultiByteToWideChar()?

问题2 :传递给Win32API函数的字符串是否必须采用特定的规范化形式,或者内核和文件系统是否与规范化无关?

Question 2: must the strings passed to Win32API functions be in a particular normalization form, or are the kernel and file system normalization-agnostic?

推荐答案

来自MSDN文章

Windows,Microsoft应用程序和.NET Framework通常使用常规输入法以C形式生成字符.在Windows上,大多数情况下,首选形式C.例如,形式C的字符由Windows键盘输入产生.但是,从Web和其他平台导入的字符会在数据流中引入其他规范化形式.

Windows, Microsoft applications, and the .NET Framework generally generate characters in form C using normal input methods. For most purposes on Windows, form C is the preferred form. For example, characters in form C are produced by Windows keyboard input. However, characters imported from the Web and other platforms can introduce other normalization forms into the data stream.

更新:我提供了一些与问题2有关的特定详细信息.

Update: I've included some specific details relating to Question #2.

关于文件系统,不需要规范化-基于文章

In regards to the file system, normalization is not required - based on the article Naming Files, Paths, and Namespaces.

由于文件系统将路径和文件名视为WCHAR的不透明序列,因此无需对Windows文件I/O API函数使用的路径和文件名字符串执行任何Unicode规范化.请记住,在对相关Windows文件I/O API函数的任何调用之外,应执行应用程序所需的任何规范化.

There is no need to perform any Unicode normalization on path and file name strings for use by the Windows file I/O API functions because the file system treats path and file names as an opaque sequence of WCHARs. Any normalization that your application requires should be performed with this in mind, external of any calls to related Windows file I/O API functions.

关于SQL Server,不需要规范化-在以下情况下,数据也不会规范化保存在数据库中.就是说,比较字符串时,SQL Server 2000使用它自己的索引内部的字符串规范化机制;但我无法找到具体细节. SQL Server 2005文章声明相同.

In regards to SQL Server, no normalization is required - nor is data normalized when saved in the database. That said, when comparing strings, SQL Server 2000 uses its own string normalization mechanism inside of indexes; but I cannot find specific details on what that is. A SQL Server 2005 article states the same.

SQL Server 7.0中的一个重要更改是提供了独立于操作系统的字符串比较模型,因此从Windows 95到Windows 2000的所有操作系统之间的排序规则都将保持一致.该字符串比较代码基于Windows 2000用于其自身的字符串规范化的相同代码,并且被封装为在所有计算机和所有版本的SQL Server中都是相同的.

One important change in SQL Server 7.0 was the provision of an operating system–independent model for string comparison, so that the collations between all operating systems from Windows 95 through Windows 2000 would be consistent. This string comparison code was based on the same code that Windows 2000 uses for its own string normalization, and is encapsulated to be the same on all computers and in all versions of SQL Server.

这篇关于Windows中的Unicode规范化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆