C ++字符串:UTF-8或16位编码? [英] C++ strings: UTF-8 or 16-bit encoding?
问题描述
我仍在尝试决定我的(住家)项目是否应使用 UTF-8 < a> strings(根据std :: string在必要时使用附加的UTF-8特定函数实现)或一些16位字符串(实现为std :: wstring)。该项目是一个编程语言和环境(如VB,它是两者的组合)。
I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both).
有一些愿望/约束:
- 如果它可以在有限的硬件上运行,例如内存有限的计算机,这将是很酷。
- 我想让代码运行在Windows,Mac和(如果资源允许)Linux。
- 我将使用 wxWidgets 作为我的GUI层,但我想要与该工具包交互的代码限制在代码库的一角(我将有非GUI可执行文件)。
- 在处理用户可见文本和应用程序的数据时,避免使用两种不同类型的字符串。
- It would be cool if it could run on limited hardware, such as computers with limited memory.
- I want the code to run on Windows, Mac and (if resources allow) Linux.
- I'll be using wxWidgets as my GUI layer, but I want the code that interacts with that toolkit confined in a corner of the codebase (I will have non-GUI executables).
- I would like to avoid working with two different kinds of strings when working with user-visible text and with the application's data.
m使用std :: string,目的是仅在必要时使用UTF-8操作函数。
Currently, I'm working with std::string, with the intent of using UTF-8 manipulation functions only when necessary. It requires less memory, and seems to be the direction many applications are going anyway.
如果您推荐使用16位编码,请输入以下字符: UTF-16 ? UCS-2 ?另一个?
If you recommend a 16-bit encoding, which one: UTF-16? UCS-2? Another one?
推荐答案
我会建议UTF-16任何种类的数据操作和UI。
Mac OS X和Win32 API使用UTF-16,与wxWidgets,Qt,ICU,Xerces和其他相同。
UTF-8可能更适合数据交换和存储。
请参见 http://unicode.org/notes/tn12/ 。
I would recommend UTF-16 for any kind of data manipulation and UI. The Mac OS X and Win32 API uses UTF-16, same for wxWidgets, Qt, ICU, Xerces, and others. UTF-8 might be better for data interchange and storage. See http://unicode.org/notes/tn12/.
但是无论你选择什么,我肯定会推荐使用UTF-8的std :: string只在必要时。
But whatever you choose, I would definitely recommend against std::string with UTF-8 "only when necessary".
一路用UTF-16或UTF-8,但不要混合和匹配,这是要求麻烦。
Go all the way with UTF-16 or UTF-8, but do not mix and match, that is asking for trouble.
这篇关于C ++字符串:UTF-8或16位编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!