PHP和C ++的UTF-8代码单元以汉字的相反顺序 [英] PHP and C++ for UTF-8 code unit in reverse order in Chinese character

查看：207 发布时间：2016/10/14 11:35:10 php c++ c utf-8 cjk

本文介绍了PHP和C ++的UTF-8代码单元以汉字的相反顺序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

中文字你好的unicode代码点分别是4F60,597D。我从此工具获得了 http://rishida.net/tools/conversion/

下面的控制台应用程序将你好
的十六进制字节序列打印为60：4F：7D：59。正如你可以看到，它是与每个字符的unicode代码点的相反顺序。 60先然后4F，而不是4F然后60.为什么会这样？谁是正确的？工具或控制台应用程序？或两者？

  void printHex（char * buf，char * filename）
 {
 FILE * fp ; 
 fp = fopen（filename，w）; 
 
 if（fp == NULL）return; 
 
 int len2 = sizeof（buf）; 
 int i; 
 char store [10]; 
 for（i = 0; i  {
 if（i> 0）fprintf（fp，：）; 
 // sprintf（store，）; 
 
 fprintf（fp，％02X，buf [i]）; 
} 
 fprintf（fp，\\\
）; 
 fclose（fp）; 
} 
 
 int main（int argc，char * argv []）
 {
 char * str3 =（char *）（L你好）; 
 printHex（str3，C：\\Users\\william\\Desktop\\My Document\\\test2.txt）; 
 
 return 0; 
}

在PHP中，当我使用这个mb_convert_encoding函数。

  echo bin2hex（mb_convert_encoding（你好，UTF-16，UTF-8）） // result：4f60 597d 
 echo bin2hex（mb_convert_encoding（부絙，UTF-16，UTF-8））; // result：604f 7d59

PHP的结果与在线工具相同，这个编码打印你好在打印机上使用php_printer.dll函数，打印输出变成ㄧ絙，反之亦然。但是C ++应用程序可以正确打印。什么可能是错误的PHP？解决方案？

解决方案

区别在于字节序。

我的猜测是UTF-16将默认输出字符串为little-endian。

那么，或者完全相反;）

请注意，这些不是unicode代码点，而是UTF-16BE / LE / UCS-2字节表示。

编辑：使用 UTF-16LE mb_convert_encoding 会给你相反的表示。

The unicode code point for the Chinese word 你好 is 4F60 , 597D respectively. which I got from this tool http://rishida.net/tools/conversion/

The console application below will print out the hexadecimal byte sequence of 你好 as 60:4F:7D:59 . As you can see it's in reverse order of the unicode code point for each character. 60 first then 4F, instead of 4F then 60. Why is it so ? Who is correct ? The tools or the console app ? Or both ?

void printHex (char * buf, char *filename)
{
    FILE *fp;
    fp=fopen(filename, "w");

    if(fp == NULL) return;

    int len2 = sizeof(buf);
    int i;
    char store[10];
    for (i = 0; i < sizeof(buf); i++)
    {
        if (i > 0) fprintf(fp,":");
        //sprintf(store, );

        fprintf(fp,"%02X", buf[i]);
    }
    fprintf(fp,"\n");
    fclose(fp);
}

int main(int argc, char* argv[])
{
    char * str3 = (char*)(L"你好");
    printHex( str3, "C:\\Users\\william\\Desktop\\My Document\\test2.txt");

        return 0;
}

While in PHP when I use this mb_convert_encoding function.

echo bin2hex(mb_convert_encoding("你好", "UTF-16", "UTF-8")); //result : 4f60 597d
echo bin2hex(mb_convert_encoding("恏絙", "UTF-16", "UTF-8")); //result : 604f 7d59

The PHP has the result same as the online tool, but when I use this encoding to print 你好 on a printer using php_printer.dll functions, the print out become 恏絙 and vice versa. But the C++ application can print out correctly. What could be wrong with PHP ? And the solution?

解决方案

They're both correct. The difference is in endian-ness.

My guess is that UTF-16 will output the string as little-endian by default. You can enforce big-endianness by using UTF-16BE instead.

That, or the exact reverse ;)

Note that these are not unicode codepoints, but rather the UTF-16BE/LE/UCS-2 byte representation. Codepoints are a different set of numbers.

EDIT: Using UTF-16LE in mb_convert_encoding will give you to the reverse representation.

这篇关于PHP和C ++的UTF-8代码单元以汉字的相反顺序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PHP和C ++的UTF-8代码单元以汉字的相反顺序 [英] PHP and C++ for UTF-8 code unit in reverse order in Chinese character

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

PHP和C ++的UTF-8代码单元以汉字的相反顺序 [英] PHP and C++ for UTF-8 code unit in reverse order in Chinese character

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭