C ++字符串:UTF-8或16位编码? [英] C++ strings: UTF-8 or 16-bit encoding?

查看:259
本文介绍了C ++字符串:UTF-8或16位编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍在尝试决定我的(住家)项目是否应使用 UTF-8 < a> strings(根据std :: string在必要时使用附加的UTF-8特定函数实现)或一些16位字符串(实现为std :: wstring)。该项目是一个编程语言和环境(如VB,它是两者的组合)。

I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both).

有一些愿望/约束:

  • It would be cool if it could run on limited hardware, such as computers with limited memory.
  • I want the code to run on Windows, Mac and (if resources allow) Linux.
  • I'll be using wxWidgets as my GUI layer, but I want the code that interacts with that toolkit confined in a corner of the codebase (I will have non-GUI executables).
  • I would like to avoid working with two different kinds of strings when working with user-visible text and with the application's data.

m使用std :: string,目的是仅在必要时使用UTF-8操作函数。

Currently, I'm working with std::string, with the intent of using UTF-8 manipulation functions only when necessary. It requires less memory, and seems to be the direction many applications are going anyway.

如果您推荐使用16位编码,请输入以下字符: UTF-16 UCS-2 ?另一个?

If you recommend a 16-bit encoding, which one: UTF-16? UCS-2? Another one?

推荐答案

我会建议UTF-16任何种类的数据操作和UI。
Mac OS X和Win32 API使用UTF-16,与wxWidgets,Qt,ICU,Xerces和其他相同。
UTF-8可能更适合数据交换和存储。
请参见 http://unicode.org/notes/tn12/

I would recommend UTF-16 for any kind of data manipulation and UI. The Mac OS X and Win32 API uses UTF-16, same for wxWidgets, Qt, ICU, Xerces, and others. UTF-8 might be better for data interchange and storage. See http://unicode.org/notes/tn12/.

但是无论你选择什么,我肯定会推荐使用UTF-8的std :: string只在必要时。

But whatever you choose, I would definitely recommend against std::string with UTF-8 "only when necessary".

一路用UTF-16或UTF-8,但不要混合和匹配,这是要求麻烦。

Go all the way with UTF-16 or UTF-8, but do not mix and match, that is asking for trouble.

这篇关于C ++字符串:UTF-8或16位编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆