C ++字符串:UTF-8或16位编码? [英] C++ strings: UTF-8 or 16-bit encoding?

查看:239
本文介绍了C ++字符串:UTF-8或16位编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍然试图决定我的(家庭)项目是否应该使用 UTF-8 字符串(在必要时使用附加的UTF-8特定功能的std :: string实现)或一些16位字符串(实现为std :: wstring)。该项目是一种编程语言和环境(如VB,它是两者的组合)。



有一些愿望/约束:




  • 如果可以在有限的硬件上运行,例如内存有限的计算机,这将是很酷的。

  • 我希望代码运行在Windows,Mac和(如果资源允许)Linux。

  • 我将使用 wxWidgets 作为我的GUI层,但我希望与该工具包交互的代码被限制在代码库的一角(我将具有非GUI可执行文件)。

  • 我希望在使用用户可见文本和应用程序的数据时避免使用两种不同类型的字符串。



目前,我'使用std :: string,仅在必要时才使用UTF-8操作函数。它需要较少的内存,并且似乎是许多应用程序正在进行的方向。



如果您推荐使用16位编码,哪一个: UTF-16 UCS-2 ?另一个?

解决方案

我会推荐UTF-16用于任何类型的数据操作和UI。
Mac OS X和Win32 API使用UTF-16,对于wxWidgets,Qt,ICU,Xerces和其他API使用UTF-16。
UTF-8可能更适合数据交换和存储。
请参阅 http://unicode.org/notes/tn12/



但是无论你选择什么,我一定会推荐使用UTF-8只在必要时使用std :: string。



使用UTF-16或UTF-8一路走来,但不要混合搭配,那就是要求麻烦。


I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both).

There are a few wishes/constraints:

  • It would be cool if it could run on limited hardware, such as computers with limited memory.
  • I want the code to run on Windows, Mac and (if resources allow) Linux.
  • I'll be using wxWidgets as my GUI layer, but I want the code that interacts with that toolkit confined in a corner of the codebase (I will have non-GUI executables).
  • I would like to avoid working with two different kinds of strings when working with user-visible text and with the application's data.

Currently, I'm working with std::string, with the intent of using UTF-8 manipulation functions only when necessary. It requires less memory, and seems to be the direction many applications are going anyway.

If you recommend a 16-bit encoding, which one: UTF-16? UCS-2? Another one?

解决方案

I would recommend UTF-16 for any kind of data manipulation and UI. The Mac OS X and Win32 API uses UTF-16, same for wxWidgets, Qt, ICU, Xerces, and others. UTF-8 might be better for data interchange and storage. See http://unicode.org/notes/tn12/.

But whatever you choose, I would definitely recommend against std::string with UTF-8 "only when necessary".

Go all the way with UTF-16 or UTF-8, but do not mix and match, that is asking for trouble.

这篇关于C ++字符串:UTF-8或16位编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆