PHP和C ++的UTF-8代码单元以汉字的相反顺序 [英] PHP and C++ for UTF-8 code unit in reverse order in Chinese character
问题描述
中文字你好的unicode代码点分别是4F60,597D。我从此工具获得了 http://rishida.net/tools/conversion/
下面的控制台应用程序将你好
的十六进制字节序列打印为60:4F:7D:59。正如你可以看到,它是与每个字符的unicode代码点的相反顺序。 60先然后4F,而不是4F然后60.为什么会这样?谁是正确的?工具或控制台应用程序?或两者?
void printHex(char * buf,char * filename)
{
FILE * fp ;
fp = fopen(filename,w);
if(fp == NULL)return;
int len2 = sizeof(buf);
int i;
char store [10];
for(i = 0; i {
if(i> 0)fprintf(fp,:);
// sprintf(store,);
fprintf(fp,%02X,buf [i]);
}
fprintf(fp,\\\
);
fclose(fp);
}
int main(int argc,char * argv [])
{
char * str3 =(char *)(L你好);
printHex(str3,C:\\Users\\william\\Desktop\\My Document\\\test2.txt);
return 0;
}
在PHP中,当我使用这个mb_convert_encoding函数。
echo bin2hex(mb_convert_encoding(你好,UTF-16,UTF-8)) // result:4f60 597d
echo bin2hex(mb_convert_encoding(부絙,UTF-16,UTF-8)); // result:604f 7d59
PHP的结果与在线工具相同,这个编码打印你好在打印机上使用php_printer.dll函数,打印输出变成ㄧ絙,反之亦然。但是C ++应用程序可以正确打印。什么可能是错误的PHP?解决方案?
区别在于字节序。
我的猜测是UTF-16将默认输出字符串为little-endian。
那么,或者完全相反;)
请注意,这些不是unicode代码点,而是UTF-16BE / LE / UCS-2字节表示。
编辑:使用 UTF-16LE
mb_convert_encoding
会给你相反的表示。 The unicode code point for the Chinese word 你好 is 4F60 , 597D respectively. which I got from this tool http://rishida.net/tools/conversion/
The console application below will print out the hexadecimal byte sequence of 你好 as 60:4F:7D:59 . As you can see it's in reverse order of the unicode code point for each character. 60 first then 4F, instead of 4F then 60. Why is it so ? Who is correct ? The tools or the console app ? Or both ?
void printHex (char * buf, char *filename)
{
FILE *fp;
fp=fopen(filename, "w");
if(fp == NULL) return;
int len2 = sizeof(buf);
int i;
char store[10];
for (i = 0; i < sizeof(buf); i++)
{
if (i > 0) fprintf(fp,":");
//sprintf(store, );
fprintf(fp,"%02X", buf[i]);
}
fprintf(fp,"\n");
fclose(fp);
}
int main(int argc, char* argv[])
{
char * str3 = (char*)(L"你好");
printHex( str3, "C:\\Users\\william\\Desktop\\My Document\\test2.txt");
return 0;
}
While in PHP when I use this mb_convert_encoding function.
echo bin2hex(mb_convert_encoding("你好", "UTF-16", "UTF-8")); //result : 4f60 597d
echo bin2hex(mb_convert_encoding("恏絙", "UTF-16", "UTF-8")); //result : 604f 7d59
The PHP has the result same as the online tool, but when I use this encoding to print 你好 on a printer using php_printer.dll functions, the print out become 恏絙 and vice versa. But the C++ application can print out correctly. What could be wrong with PHP ? And the solution?
They're both correct. The difference is in endian-ness.
My guess is that UTF-16 will output the string as little-endian by default. You can enforce big-endianness by using UTF-16BE instead.
That, or the exact reverse ;)
Note that these are not unicode codepoints, but rather the UTF-16BE/LE/UCS-2 byte representation. Codepoints are a different set of numbers.
EDIT: Using UTF-16LE
in mb_convert_encoding
will give you to the reverse representation.
这篇关于PHP和C ++的UTF-8代码单元以汉字的相反顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!