手动将字符从UTF-8转换为ISO-8859-1 [英] Convert character from UTF-8 to ISO-8859-1 manually

查看:379
本文介绍了手动将字符从UTF-8转换为ISO-8859-1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符ö".如果我查看此UTF-8表,我会看到它具有十六进制值F6.如果我查看 Unicode表,我会看到ö"具有索引E016.如果将两者都加,则将得到F6的代码点的十六进制值.这是二进制值1111 0110.

1)如何从十六进制值F6到索引E016?
2)我不知道如何从F6到两个字节C3 B6 ...

因为我没有得到结果,所以我尝试另辟way径. ö"在ISO-8859-1中表示为Ã".在 UTF-8表中,我可以看到Ã"具有十进制值195和¶"的十进制值为182.转换为位,这是1100 0011 1011 0110.

过程:

  1. 查看并获取字符ö"的unicode.根据索引E016计算得出Unicode U+00F6.

  2. 根据wildplasser发布的算法,您可以计算编码后的UTF-8值C3B6.

  3. 以二进制形式获得1100 0011 1011 0110,它对应于十进制值195182.

  4. 如果这些值被解释为 ISO 8859-1 (只有1个字节),那么您会收到ö".

PS:我还发现了此链接,其中显示了步骤2中的值.

解决方案

您正在使用的页面使您有些困惑.您的"UTF-8表"或"Unicode表"都没有为您提供UTF-8中代码点的值.它们都只是列出了字符的Unicode值.

在Unicode中,每个字符(代码点")都分配有一个唯一的数字.字符ö被分配了代码点U+00F6,其十六进制为F6,十进制为246.

UTF-8是表示形式 Unicode,每个Unicode代码点使用1到4个字节的序列.那篇文章中描述了从32位Unicode代码点到UTF-8字节序列的转换-一旦您习惯了,它的操作就非常简单.当然,计算机一直在执行此操作,但是您可以通过铅笔和纸轻松地进行操作,并且只需稍加练习即可在脑海中进行操作.

如果执行该转换,您将看到U+00F6转换为UTF-8序列C3 B6或二进制形式的1100 0011 1011 0110,这就是ö的UTF-8表示形式的原因./p>

您的问题的另一半是关于ISO-8859-1的.这是一种字符编码,通常称为"拉丁1 ". Latin-1编码的数值与Unicode中的前256个代码点相同,因此在Latin-1中öF6.

一旦您已在UTF-8和标准Unicode代码点(UTF-32)之间进行了转换,则获取Latin-1编码应该很简单.但是,并非所有UTF-8序列/Unicode字符都具有对应的Latin-1字符.

请参阅优秀文章每个软件开发人员绝对,肯定地必须完全了解Unicode和字符集的绝对最低要求(没有借口!)可以更好地理解它们之间的字符编码和转换.

I have the character "ö". If I look in this UTF-8 table I see it has the hex value F6. If I look in the Unicode table I see that "ö" has the indices E0and 16. If I add both I get the hex value of the code point of F6. This is the binary value 1111 0110.

1) How do I get from the hex value F6 to the indices E0 and 16?
2) I don't know how to come from F6 to the two bytes C3 B6 ...

Because I didn't got the results I tried to go the other way. "ö" is represented in ISO-8859-1 as "ö". In the UTF-8 table I can see that "Ã" has the decimal value 195 and "¶" has the decimal value 182. Converted to bits this is 1100 0011 1011 0110.

Process:

  1. Look in a table and get the unicode for the character "ö". Calculated from the indices E0 and 16 you get the Unicode U+00F6.

  2. According to the algorithm posted by wildplasser you can calculate the coded UTF-8 value C3 and B6.

  3. In the binary form you get 1100 0011 1011 0110 which corresponds to the decimal values 195 and 182.

  4. If these values are interpreted as ISO 8859-1 (only 1 byte) then you get "ö".

PS: I found also this link, which shows the values from step 2.

解决方案

The pages you are using are confusing you somewhat. Neither your "UTF-8 table" or "Unicode table" are giving you the value of the code point in UTF-8. They are both simply listing the Unicode value of the characters.

In Unicode, every character ("code point") has a unique number assigned to it. The character ö is assigned the code point U+00F6, which is F6 in hexadecimal, and 246 in decimal.

UTF-8 is a representation of Unicode, using a sequence of between one and four bytes per Unicode code point. The transformation from 32-bit Unicode code points to UTF-8 byte sequences is described in that article - it is pretty simple to do, once you get used to it. Of course, computers do it all the time, but you can do it with a pencil and paper easily, and in your head with a bit of practice.

If you do that transformation, you will see that U+00F6 transforms to the UTF-8 sequence C3 B6, or 1100 0011 1011 0110 in binary, which is why that is the UTF-8 representation of ö.

The other half of your question is about ISO-8859-1. This is a character encoding commonly called "Latin-1". The numeric values of the Latin-1 encoding are the same as the first 256 code points in Unicode, thus ö is F6 in Latin-1.

Once you have converted between UTF-8 and standard Unicode code points (UTF-32), it should be trivial to get the Latin-1 encoding. However, not all UTF-8 sequences / Unicode characters have corresponding Latin-1 characters.

See the excellent article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for a better understanding of character encodings and transformations between them.

这篇关于手动将字符从UTF-8转换为ISO-8859-1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆