如何在C ++中使用Unicode？ [英] How to use Unicode in C++?

查看：203 发布时间：2016/10/22 17:50:31 c++ string unicode

本文介绍了如何在C ++中使用Unicode？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设一个非常简单的程序：

Assuming a very simple program that:

请求名称。

这很简单，

但我的问题是，如果我使用日语字符输入名称，我不知道如何做同样的事情。

But my problem is that I don't know how to do the same thing if I enter the name using japanese characters.

所以，如果你知道如何在C ++中做这个，请给我一个例子（我可以编译和测试）

So, if you know how to do this in C++, please show me an example (that I can compile and test)

感谢。

user362981：感谢您的帮助。我编译的代码，你写的没有问题，他们的控制台窗口出现，我不能输入任何日语字符（使用IME）。此外，如果
我将你的代码（hello）中的一个单词改为包含日语字符的单词，它也不会显示这些。

user362981 : Thanks for your help. I compiled the code that you wrote without problem, them the console window appears and I cannot enter any Japanese characters on it (using IME). Also if I change a word in your code ("hello") to one that contains Japanese characters, it also will not display these.

Svisstack：谢谢你的帮助。但是当我编译你的代码，我得到以下错误：

Svisstack : Also thanks for your help. But when I compile your code I get the following error:

warning: deprecated conversion from string constant to 'wchar_t*'
error: too few arguments to function 'int swprintf(wchar_t*, const wchar_t*, ...)'
error: at this point in file
warning: deprecated conversion from string constant to 'wchar_t*'

推荐答案

关于宽字符的答案。宽字符，特别是 wchar_t 不等于Unicode 。你可以使用它们（有一些陷阱）来存储Unicode，就像你可以使用 unsigned char 。 wchar_t 非常依赖于系统。引用 Unicode标准版本5.2，第5章：

You're going to get a lot of answers about wide characters. Wide characters, specifically wchar_t do not equal Unicode. You can use them (with some pitfalls) to store Unicode, just as you can an unsigned char. wchar_t is extremely system-dependent. To quote the Unicode Standard, version 5.2, chapter 5:

使用 wchar_t 宽字符类型，ANSI / ISO C提供
包含固定宽度，宽字符。 ANSI / ISO C将宽
字符集的语义留给特定实现，但要求来自便携式C执行集的字符通过零扩展与其宽字符等效对应。

With the wchar_t wide character type, ANSI/ISO C provides for inclusion of fixed-width, wide characters. ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension.

和

wchar_t 是编译器特定的，可以小到8位。因此，需要在任何C或C ++编译器中移植的程序不应使用 wchar_t 来存储Unicode文本。 wchar_t 类型用于存储编译器定义的宽字符，在某些编译器中可能是Unicode字符。



  The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently,
  programs that need to be portable across any C or C++ compiler should not use wchar_t
  for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide
  characters, which may be Unicode characters in some compilers.
所以，它的实现定义。这里有两个实现：在Linux上， wchar_t 是4个字节宽，并且表示UTF-32编码中的文本（不考虑当前语言环境）。 （根据你的系统，无论是BE还是LE。）然而，Windows有一个2字节宽 wchar_t ，代表UTF-16代码单元。完全不同。
So, it's implementation defined. Here's two implementations: On Linux, wchar_t is 4 bytes wide, and represents text in the UTF-32 encoding (regardless of the current locale). (Either BE or LE depending on your system, whichever is native.) Windows, however, has a 2 byte wide wchar_t, and represents UTF-16 code units with them. Completely different.
更好的路径：了解语言环境，因为您需要知道。例如，因为使用UTF-8（Unicode），以下程序将使用Unicode：
A better path: Learn about locales, as you'll need to know that. For example, because I have my environment setup to use UTF-8 (Unicode), the following program will use Unicode:
#include <iostream>

int main()
{
    setlocale(LC_ALL, "");
    std::cout << "What's your name? ";
    std::string name;
    std::getline(std::cin, name);
    std::cout << "Hello there, " << name << "." << std::endl;
    return 0;
}

 ... 
$ ./uni_test
What's your name? 佐藤 幹夫
Hello there, 佐藤 幹夫.
$ echo $LANG
en_US.UTF-8

但没有Unicode关于它。它只是读取字符，它作为UTF-8 ，因为我有我的环境设置。我可以很容易地说heck，我是捷克的，让我们使用ISO-8859-2：突然，该程序在ISO-8859-2获得输入，但由于它只是反流，它不重要，该程序仍将正确执行。
But there's nothing Unicode about it. It merely reads in characters, which come in as UTF-8 because I have my environment set that way. I could just as easily say "heck, I'm part Czech, let's use ISO-8859-2": Suddenly, the program is getting input in ISO-8859-2, but since it's just regurgitating it, it doesn't matter, the program will still perform correctly.
现在，如果该示例已读入我的名称，然后尝试写入一个XML文件，并愚蠢地写<？xml version =1.0encoding =UTF-8？> 在顶部，当我的终端是UTF-8，当我的终端在ISO-8859-2。在后一种情况下，需要在将其序列化为XML文件之前对其进行转换。 （或者，只是写ISO-8859-2作为XML文件的编码。）
Now, if that example had read in my name, and then tried to write it out into an XML file, and stupidly wrote <?xml version="1.0" encoding="UTF-8" ?> at the top, it would be right when my terminal was in UTF-8, but wrong when my terminal was in ISO-8859-2. In the latter case, it would need to convert it before serializing it to the XML file. (Or, just write ISO-8859-2 as the encoding for the XML file.)
在许多POSIX系统上，当前语言环境通常是UTF-为用户提供了几个优点，但这不是保证。只输出UTF-8到 stdout 通常是正确的，但不总是正确的。说我使用ISO-8859-2：如果你无意间输出一个ISO-8859-1è（ 0xE8 ）到我的终端，我会看到一个č （ 0xE8 ）。同样，如果您输出一个UTF-8è（ 0xC3 0xA8 ），我会看到（ISO-8859-2）Ă¨（ 0xC3 0xA8 ）。此错误字符的错误已称为 Mojibake 。
On many POSIX systems, the current locale is typically UTF-8, because it provides several advantages to the user, but this isn't guaranteed. Just outputting UTF-8 to stdout will usually be correct, but not always. Say I am using ISO-8859-2: if you mindlessly output an ISO-8859-1 "è" (0xE8) to my terminal, I'll see a "č" (0xE8). Likewise, if you output a UTF-8 "è" (0xC3 0xA8), I'll see (ISO-8859-2) "Ă¨" (0xC3 0xA8). This barfing of incorrect characters has been called Mojibake.
经常，你只是改变数据周围，这并不重要。这通常在需要序列化数据时起作用。 （许多互联网协议使用UTF-8或UTF-16，例如：如果您从ISO-8859-2终端获得数据或在Windows-1252中编码的文本文件，那么您必须转换它，否则您将发送 Mojibake 。）
Often, you're just shuffling data around, and it doesn't matter much. This typically comes into play when you need to serialize data. (Many internet protocols use UTF-8 or UTF-16, for example: if you got data from an ISO-8859-2 terminal, or a text file encoded in Windows-1252, then you have to convert it, or you'll be sending Mojibake.)
很遗憾，这是关于状态的Unicode支持，在C和C ++。你必须记住：这些语言是真正的系统不可知的，并没有约束任何特定的方式做。这包括字符集。有很多的图书馆，但是，在处理Unicode和其他字符集。
Sadly, this is about the state of Unicode support, in both C and C++. You have to remember: these languages are really system-agnostic, and don't bind to any particular way of doing it. That includes character-sets. There are tons of libraries out there, however, for dealing with Unicode and other character sets.
最后，这不是所有的复杂真的：知道什么编码你的数据是在，并知道你的输出应该是什么编码。如果他们不一样，你需要做一个转换。无论您是使用 std :: cout 还是 std :: wcout ，这都适用。在我的示例中， stdin 或 std :: cin 和 stdout  /  std :: cout 有时为UTF-8，有时为ISO-8859-2。
In the end, it's not all that complicated really: Know what encoding your data is in, and know what encoding your output should be in. If they're not the same, you need to do a conversion. This applies whether you're using std::cout or std::wcout. In my examples, stdin or std::cin and stdout/std::cout were sometimes in UTF-8, sometimes ISO-8859-2.

                        这篇关于如何在C ++中使用Unicode？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在C ++中使用Unicode？ [英] How to use Unicode in C++?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何在C ++中使用Unicode？ [英] How to use Unicode in C++?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭