C ++ Windows十进制到UTF-8字符转换 [英] C++ Windows decimal to UTF-8 Character Conversion

查看：139 发布时间：2016/10/21 0:13:04 c++ windows winapi unicode utf-8

本文介绍了C ++ Windows十进制到UTF-8字符转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直使用下面的函数从unicode字符的十进制表示转换为C ++中的UTF8字符本身。我现在的功能在Linux / Unix系统上运行良好，但是在Windows上仍然返回错误的字符。

  void GetUnicodeChar （unsigned int code，char chars [5]）{
 if（code< = 0x7F）{
 chars [0] =（code& 0x7F）; chars [1] ='\0'; 
} else if（code< = 0x7FF）{
 //一个连续字节
 chars [1] = 0x80 | （代码& 0x3F）; code =（code>>> 6）; 
 chars [0] = 0xC0 | （代码& 0x1F）; chars [2] ='\0'; 
} else if（code <= 0xFFFF）{
 //两个连续字节
 chars [2] = 0x80 | （代码& 0x3F）; code =（code>>> 6）; 
 chars [1] = 0x80 | （代码& 0x3F）; code =（code>>> 6）; 
 chars [0] = 0xE0 | （代码& 0xF）; chars [3] ='\0'; 
} else if（code< = 0x10FFFF）{
 //三个连续字节
 chars [3] = 0x80 | （代码& 0x3F）; code =（code>>> 6）; 
 chars [2] = 0x80 | （代码& 0x3F）; code =（code>>> 6）; 
 chars [1] = 0x80 | （代码& 0x3F）; code =（code>>> 6）; 
 chars [0] = 0xF0 | （代码& 0x7）; chars [4] ='\0'; 
} else {
 // unicode替换字符
 chars [2] = 0xEF; chars [1] = 0xBF; chars [0] = 0xBD; 
 chars [3] ='\0'; 
} 
}

任何人都可以提供替代功能或修复

  INPUT：225 
 OSX上的输出：á
 Windows上的输出：├í

解决方案

您不会显示代码进行打印，但假定您正在这样做：

  char s [5]; 
 GetUnicodeChar（225，s）; 
 std :: cout<< s < '\\\
';

在OS X上正常输出的原因以及Windows上的错误输出是因为OS X使用UTF-8作为默认编码，Windows使用一些旧编码。因此，当您在OS X上输出UTF-8时，OS X假定（正确）它是UTF-8并显示它。当您在Windows上输出UTF-8时，Windows假定（不正确）它是其他编码。

 
 
 您可以使用 iconv 
  iconv -f cp437 -t utf8< ;< á
  
这需要UTF-8字符串，将其重新解释为使用Windows代码页437，并将其转换为UTF-8进行显示。 OS X上的输出是├í。
 
 
 对于测试小东西，你可以做以下操作， -8数据。
  #include< Wincon.h> 
 
 #include< cstdio> 
 
 char s [5]; 
 GetUnicodeChar（225，s）; 
 
 SetConsoleOutputCP（CP_UTF8）; 
 std :: printf（％s\\\
，s）; 
  
此外，Windows标准库实现的一部分不支持UTF-因此即使在您更改输出编码代码后，如 std :: cout<< 
 
 
 
 
 
 另一方面，参数如下：
  void GetUnicodeChar（unsigned int code，char chars [5]）{
  
是一个坏主意。这不会捕获错误，例如：
  char * s; GetUnicodeChar（225，s）; 
 char s [1]; GetUnicodeChar（225，s）; 
  
您可以通过更改引用数组的函数来避免这些特定问题：
  void GetUnicodeChar（unsigned int code，char（& chars）[5]）{
  
但是一般来说，我建议只是避免使用原始数组。如果你真的想要一个数组，你可以使用 std :: array 。你可以使用 std :: string 如果你想要文本，IMO是一个不错的选择：
  std :: string GetUnicodeChar（unsigned int code）; 
  
 
I've been using the function below to convert from the decimal representation of unicode characters to the UTF8 character itself in C++. The function I have at the moment works well on Linux / Unix system but it keeps returning the wrong characters on Windows.
void GetUnicodeChar(unsigned int code, char chars[5]) {
    if (code <= 0x7F) {
        chars[0] = (code & 0x7F); chars[1] = '\0';
    } else if (code <= 0x7FF) {
        // one continuation byte
        chars[1] = 0x80 | (code & 0x3F); code = (code >> 6);
        chars[0] = 0xC0 | (code & 0x1F); chars[2] = '\0';
    } else if (code <= 0xFFFF) {
        // two continuation bytes
        chars[2] = 0x80 | (code & 0x3F); code = (code >> 6);
        chars[1] = 0x80 | (code & 0x3F); code = (code >> 6);
        chars[0] = 0xE0 | (code & 0xF); chars[3] = '\0';
    } else if (code <= 0x10FFFF) {
        // three continuation bytes
        chars[3] = 0x80 | (code & 0x3F); code = (code >> 6);
        chars[2] = 0x80 | (code & 0x3F); code = (code >> 6);
        chars[1] = 0x80 | (code & 0x3F); code = (code >> 6);
        chars[0] = 0xF0 | (code & 0x7); chars[4] = '\0';
    } else {
        // unicode replacement character
        chars[2] = 0xEF; chars[1] = 0xBF; chars[0] = 0xBD;
        chars[3] = '\0';
    }
}
Can anyone provide an alternative function or a fix for the current function I'm using that will work on Windows?

--UPDATE--
INPUT: 225
OUTPUT ON OSX: á
OUTPUT ON WINDOWS: ├í

 解决方案 
You don't show your code for printing, but presumably you're doing something like this:
char s[5];
GetUnicodeChar(225, s);
std::cout << s << '\n';
The reason you're getting okay output on OS X and bad output on Windows is because OS X uses UTF-8 as the default encoding and Windows uses some legacy encoding. So when you output UTF-8 on OS X, OS X assumes (correctly) that it's UTF-8 and displays it as such. When you output UTF-8 on Windows, Windows assumes (incorrectly) that it's some other encoding.

You can simulate the problem on OS X using the iconv program with the following command in Terminal.app
iconv -f cp437 -t utf8 <<< "á" 
This takes the UTF-8 string, reinterprets it as a string encoded using Windows code page 437, and converts that to UTF-8 for display. The output on OS X is ├í.

For testing small things you can do the following to properly display UTF-8 data on Windows.
#include <Wincon.h>

#include <cstdio>

char s[5];
GetUnicodeChar(225, s);

SetConsoleOutputCP(CP_UTF8);
std::printf("%s\n", s);
Also, parts of Windows' implementation of the standard library don't support output of UTF-8, so even after you change the output encoding code like std::cout << s still won't work.



On a side note, taking an array as a parameter like this:
void GetUnicodeChar(unsigned int code, char chars[5]) { 
is a bad idea. This will not catch mistakes such as:
char *s; GetUnicodeChar(225, s);
char s[1]; GetUnicodeChar(225, s);
You can avoid these specific problems by changing the function to take a reference to an array instead:
void GetUnicodeChar(unsigned int code, char (&chars)[5]) { 
However in general I'd recommend just avoiding raw arrays altogether. You can use std::array if you really want an array. You can use std::string if you want text, which IMO is a good choice here:
std::string GetUnicodeChar(unsigned int code);


                        
这篇关于C ++ Windows十进制到UTF-8字符转换的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

C ++ Windows十进制到UTF-8字符转换 [英] C++ Windows decimal to UTF-8 Character Conversion

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

C ++ Windows十进制到UTF-8字符转换 [英] C++ Windows decimal to UTF-8 Character Conversion

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭