C ++中的UTF-8兼容性 [英] UTF-8 Compatibility in C++

查看:157
本文介绍了C ++中的UTF-8兼容性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,该程序必须能够处理所有语言的文本.我的理解是UTF-8可以胜任,但是我遇到了一些问题.

I am writing a program that needs to be able to work with text in all languages. My understanding is that UTF-8 will do the job, but I am experiencing a few problems with it.

我是说UTF-8可以存储在C ++中的简单char中吗?如果是这样,为什么在使用带有charstringstringstream的程序时出现以下警告:warning C4566: character represented by universal-character-name '\uFFFD' cannot be represented in the current code page (1252). (当我使用wchar_twstringwstringstream时,不会出现该错误.)

Am I right to say that UTF-8 can be stored in a simple char in C++? If so, why do I get the following warning when I use a program with char, string and stringstream: warning C4566: character represented by universal-character-name '\uFFFD' cannot be represented in the current code page (1252). (I do not get that error when I use wchar_t, wstring and wstringstream.)

此外,我知道UTF是可变长度的.当我使用atsubstr字符串方法时,我会得到错误的答案吗?

Additionally, I know that UTF is variable length. When I use the at or substr string methods would I get the wrong answer?

推荐答案

要使用UTF-8字符串文字,您需要在它们前面加上u8前缀,否则您将获得实现的字符集(在您的情况下,似乎是Windows-1252):u8"\uFFFD"是以NTF表示的替换字符(U + FFFD)的空终止字节序列.它的类型为char const[4].

To use UTF-8 string literals you need to prefix them with u8, otherwise you get the implementation's character set (in your case, it seems to be Windows-1252): u8"\uFFFD" is null-terminated sequence of bytes with the UTF-8 representation of the replacement character (U+FFFD). It has type char const[4].

由于UTF-8具有可变长度,因此各种索引将以代码单位而不是代码点进行索引.由于它是可变长度的,因此不可能对UTF-8序列中的代码点进行随机访问.如果要随机访问,则需要使用固定长度的编码,例如UTF-32.为此,您可以在字符串上使用U前缀.

Since UTF-8 has variable length, all kinds of indexing will do indexing in code units, not codepoints. It is not possible to do random access on codepoints in an UTF-8 sequence because of it's variable length nature. If you want random access you need to use a fixed length encoding, like UTF-32. For that you can use the U prefix on strings.

这篇关于C ++中的UTF-8兼容性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆