同一个字符可以有2种不同的UTF-8编码？ [英] Can there be 2 different UTF-8 encodings for the same character?

查看：121 发布时间：2016/11/19 14:57:31 perl utf-8 character-encoding

本文介绍了同一个字符可以有2种不同的UTF-8编码？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个应用程序，需要将其输入从UTF-8转换为ISO-8859-1（拉丁语1）。

I'm writing an application that needs to transcode its input from UTF-8 to ISO-8859-1 (Latin 1).

我有时会得到奇怪的编码为一些变音字符。例如，具有2个点（0xEB）的拉丁语1E通常以UTF-8 0xC3 0xAB形式出现，但有时也作为0xC3 0x83 0xC2 0xAB。

All works fine, except I sometimes get strange encodings for some umlaut characters. For example the Latin 1 E with 2 dots (0xEB) usually comes as UTF-8 0xC3 0xAB, but sometimes also as 0xC3 0x83 0xC2 0xAB.

来自不同的来源并注意到第一个和最后一个字符匹配我的期望，可能有一个编码规则，我的图书馆不知道吗？

This happened a number of times from different sources and noting that first and last characters match what I expect, could there be an encoding rule that my library doesn't know about ?

推荐答案

$ "\xC3\x83\xC2\xAB"
Ã«
$ use Encode

$ decode 'UTF-8', "\xC3\x83\xC2\xAB"
ë

您有双重编码的UTF-8。 Encode :: Repair 是一种处理方式。

You have double-encoded UTF-8. Encode::Repair is one way to deal with that.

这篇关于同一个字符可以有2种不同的UTF-8编码？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

同一个字符可以有2种不同的UTF-8编码？ [英] Can there be 2 different UTF-8 encodings for the same character?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

同一个字符可以有2种不同的UTF-8编码？ [英] Can there be 2 different UTF-8 encodings for the same character?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭