PHP和C ++的UTF-8代码单元以汉字的相反顺序 [英] PHP and C++ for UTF-8 code unit in reverse order in Chinese character

查看:207
本文介绍了PHP和C ++的UTF-8代码单元以汉字的相反顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

中文字你好的unicode代码点分别是4F60,597D。我从此工具获得了 http://rishida.net/tools/conversion/



下面的控制台应用程序将你好
的十六进制字节序列打印为60:4F:7D:59。正如你可以看到,它是与每个字符的unicode代码点的相反顺序。 60先然后4F,而不是4F然后60.为什么会这样?谁是正确的?工具或控制台应用程序?或两者?

  void printHex(char * buf,char * filename)
{
FILE * fp ;
fp = fopen(filename,w);

if(fp == NULL)return;

int len2 = sizeof(buf);
int i;
char store [10];
for(i = 0; i {
if(i> 0)fprintf(fp,:);
// sprintf(store,);

fprintf(fp,%02X,buf [i]);
}
fprintf(fp,\\\
);
fclose(fp);
}

int main(int argc,char * argv [])
{
char * str3 =(char *)(L你好);
printHex(str3,C:\\Users\\william\\Desktop\\My Document\\\test2.txt);

return 0;
}

在PHP中,当我使用这个mb_convert_encoding函数。

  echo bin2hex(mb_convert_encoding(你好,UTF-16,UTF-8)) // result:4f60 597d 
echo bin2hex(mb_convert_encoding(부絙,UTF-16,UTF-8)); // result:604f 7d59

PHP的结果与在线工具相同,这个编码打印你好在打印机上使用php_printer.dll函数,打印输出变成ㄧ絙,反之亦然。但是C ++应用程序可以正确打印。什么可能是错误的PHP?解决方案?

解决方案

区别在于字节序。



我的猜测是UTF-16将默认输出字符串为little-endian。



那么,或者完全相反;)



请注意,这些不是unicode代码点,而是UTF-16BE / LE / UCS-2字节表示。

编辑:使用 UTF-16LE mb_convert_encoding 会给你相反的表示。


The unicode code point for the Chinese word 你好 is 4F60 , 597D respectively. which I got from this tool http://rishida.net/tools/conversion/

The console application below will print out the hexadecimal byte sequence of 你好 as 60:4F:7D:59 . As you can see it's in reverse order of the unicode code point for each character. 60 first then 4F, instead of 4F then 60. Why is it so ? Who is correct ? The tools or the console app ? Or both ?

void printHex (char * buf, char *filename)
{
    FILE *fp;
    fp=fopen(filename, "w");

    if(fp == NULL) return;

    int len2 = sizeof(buf);
    int i;
    char store[10];
    for (i = 0; i < sizeof(buf); i++)
    {
        if (i > 0) fprintf(fp,":");
        //sprintf(store, );

        fprintf(fp,"%02X", buf[i]);
    }
    fprintf(fp,"\n");
    fclose(fp);
}

int main(int argc, char* argv[])
{
    char * str3 = (char*)(L"你好");
    printHex( str3, "C:\\Users\\william\\Desktop\\My Document\\test2.txt");

        return 0;
}

While in PHP when I use this mb_convert_encoding function.

echo bin2hex(mb_convert_encoding("你好", "UTF-16", "UTF-8")); //result : 4f60 597d
echo bin2hex(mb_convert_encoding("恏絙", "UTF-16", "UTF-8")); //result : 604f 7d59

The PHP has the result same as the online tool, but when I use this encoding to print 你好 on a printer using php_printer.dll functions, the print out become 恏絙 and vice versa. But the C++ application can print out correctly. What could be wrong with PHP ? And the solution?

解决方案

They're both correct. The difference is in endian-ness.

My guess is that UTF-16 will output the string as little-endian by default. You can enforce big-endianness by using UTF-16BE instead.

That, or the exact reverse ;)

Note that these are not unicode codepoints, but rather the UTF-16BE/LE/UCS-2 byte representation. Codepoints are a different set of numbers.

EDIT: Using UTF-16LE in mb_convert_encoding will give you to the reverse representation.

这篇关于PHP和C ++的UTF-8代码单元以汉字的相反顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆