à ©和其他代码 [英] à © and other codes

查看:143
本文介绍了Ã ©和其他代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个充满这些代码的文件,我想将其转换"为普通字符(我是说整个文件).我该怎么办?

I got a file full of those codes, and I want to "translate" it into normal chars (a whole file, I mean). How can I do it?

非常感谢您.

推荐答案

好像您最初有一个UTF-8文件,该文件已被解释为8位编码(例如 UTF-8编码序列.

Looks like you originally had a UTF-8 file which has been interpreted as an 8 bit encoding (e.g. ISO-8859-15) and entity-encoded. I say this because the sequence C3A9 looks like a pretty plausible UTF-8 encoding sequence.

您将需要先对其进行实体解码,然后再次使用UTF-8编码.然后,您可以使用 iconv 之类的格式转换为您选择的编码.

You will need to first entity-decode it, then you'll have a UTF-8 encoding again. You could then use something like iconv to convert to an encoding of your choosing.

查看您的示例:

  • Ã ©将被解码为字节序列0xC3A9
  • 0xC3A9 = 11000011 10101001二进制
  • 第一个八位位组中的前导110告诉我们,这可以解释为UTF-8两字节序列.当第二个八位字节以10开头时,我们正在研究可以解释为UTF-8的东西.为此,我们采用第一个八位位组的最后5位,以及第二个八位位组的最后6位...
  • 因此,解释为UTF8的是00011101001 = E9 =é(带小号的拉丁文小写字母E )
  • Ã © would be decoded as the byte sequence 0xC3A9
  • 0xC3A9 = 11000011 10101001 in binary
  • the leading 110 in the first octet tells us this could be interpreted as a UTF-8 two byte sequence. As the second octet starts with 10, we're looking at something we can interpret as UTF-8. To do that, we take the last 5 bits of the first octet, and the last 6 bits of the second octet...
  • So, interpreted as UTF8 it's 00011101001 = E9 = é (LATIN SMALL LETTER E WITH ACUTE)

您提到要使用PHP处理此问题,类似的事情可能会为您完成:

You mention wanting to handle this with PHP, something like this might do it for you:

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&Précédent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859

这篇关于Ã ©和其他代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆