在C数组多字节UTF-8 ++ [英] Multi-Byte UTF-8 in Arrays in C++

查看:177
本文介绍了在C数组多字节UTF-8 ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直有麻烦3个字节的Uni code UTF-8字符数组工作。当他们在字符数组我得到多字符字符常量和隐式转换不断警告,但是当我使用wchar_t的阵列,wcout都没有返回值。由于项目性质的,它必须是一个数组,并且不是字符串。以下是我一直在试图做一个例子。

I have been having trouble working with 3-byte Unicode UTF-8 characters in arrays. When they are in char arrays I get multi-character character constant and implicit constant conversion warnings, but when I use wchar_t arrays, wcout returns nothing at all. Because of the nature of the project, it must be an array and not a string. Below is an example of what I've been trying to do.

#include <iostream>
#include <string>
using namespace std;
int main()
{
    wchar_t testing[40];
    testing[0] = L'\u0B95';
    testing[1] = L'\u0BA3';
    testing[2] = L'\u0B82';
    testing[3] = L'\0';
    wcout << testing[0] << endl;
    return 0;
}

有什么建议?我与OSX合作。

Any suggestions? I'm working with OSX.

推荐答案

由于'\\ u0B95需要3个字节,它被认为是一个的多字符文字。一个多字符常量的类型 INT 和实现定义的值。 (事实上​​,我不认为海湾合作委员会是正确的做到这一点

Since '\u0B95' requires 3 bytes, it is considered a multicharacter literal. A multicharacter literal has type int and an implementation-defined value. (Actually, I don't think gcc is correct to do this)

preFIX字面使得之前有键入 wchar_t的并具有实现定义的值(它映射到一个值,在执行宽字符集的这是的基本执行宽字符集的实现定义的超集的)。

Putting the L prefix before the literal makes it have type wchar_t and has an implementation defined value (it maps to a value in the execution wide-character set which is an implementation defined superset of the basic execution wide-character set).

在C ++ 11的标准为我们提供了更多的Uni code意识到类型和文字。附加的类型是 char16_t char32_t ,其值是统一code code点的再present的字符。它们类似于分别为UTF-16和UTF-32。

The C++11 standard provides us with some more Unicode aware types and literals. The additional types are char16_t and char32_t, whose values are the Unicode code-points that represent the character. They are analogous to UTF-16 and UTF-32 respectively.

由于需要字符文字从基本多文种平面存储字符,你需要一个 char16_t 文字。这可以写成,例如, U'\\ u0B95。因此,你可以写你的code如下,没有任何警告或错误:

Since you need character literals to store characters from the basic multilingual plane, you'll need a char16_t literal. This can be written as, for example, u'\u0B95'. You can therefore write your code as follows, with no warnings or errors:

char16_t testing[40];
testing[0] = u'\u0B95';
testing[1] = u'\u0BA3';
testing[2] = u'\u0B82';
testing[3] = u'\0';

不幸的是,I / O库没有很好地与这些新类型的游戏。

Unfortunately, the I/O library does not play nicely with these new types.

如果你不真正需要使用上述字符文字,你可以利用新的UTF-8字符串的:

If you do not truly require using character literals as above, you may make use of the new UTF-8 string literals:

const char* testing = u8"\u0B95\u0BA3\u0B82";

这将带code中的字符为UTF-8。

This will encode the characters as UTF-8.

这篇关于在C数组多字节UTF-8 ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆