Python UTF-8 Latin-1显示错误的字符 [英] Python UTF-8 Latin-1 displays wrong character

查看:102
本文介绍了Python UTF-8 Latin-1显示错误的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个非常小的脚本,可以将latin-1字符转换为unicode(我是Python的完整初学者).

I'm writing a very small script that can convert latin-1 characters into unicode (I'm a complete beginner in Python).

我尝试了这样的方法:

def latin1_to_unicode(character):

    uni = character.decode('latin-1').encode("utf-8")
    retutn uni

它对于不是特定于latin-1集的字符很好用,但是如果我尝试以下示例:

It works fine for characters that are not specific to the latin-1 set, but if I try the following example:

print latin1_to_Unicode('å')

它返回Ã¥而不是å.其他字母,例如æø.

It returns Ã¥ instead of å. Same goes for other letters like æ and ø.

任何人都可以解释为什么会这样吗?谢谢

Can anyone please explain why this is happening? Thanks

我的脚本中有#-*-编码:utf8-*-声明,如果对问题有影响的话

I have the # -*- coding: utf8 -*- declaration in my script, if it matters any to the problem

推荐答案

您的源代码已编码为UTF-8,但是您正在将数据解码为Latin-1.请勿这样做,您正在创建 Mojibake .

Your source code is encoded to UTF-8, but you are decoding the data as Latin-1. Don't do that, you are creating a Mojibake.

改为从UTF-8解码,并且不要再次编码. print 将写入 sys.stdout ,该文件已使用您的终端或控制台编解码器配置(在Python启动时检测到).

Decode from UTF-8 instead, and don't encode again. print will write to sys.stdout which will have been configured with your terminal or console codec (detected when Python starts).

我的终端配置为UTF-8,因此当我在终端中输入å字符时,会生成UTF-8数据:

My terminal is configured for UTF-8, so when I enter the å character in my terminal, UTF-8 data is produced:

>>> 'å'
'\xc3\xa5'
>>> 'å'.decode('latin1')
u'\xc3\xa5'
>>> print 'å'.decode('latin1')
Ã¥

您可以看到该字符使用了两个字节;当使用配置为使用UTF-8的编辑器保存Python源代码时,Python将从磁盘读取完全相同的字节以放入您的字节串中.

You can see that the character uses two bytes; when saving your Python source with an editor configured to use UTF-8, Python reads the exact same bytes from disk to put into your bytestring.

将这两个字节解码为Latin-1会产生两个对应于Latin-1编解码器的Unicode代码点.

Decoding those two bytes as Latin-1 produces two Unicode codepoints corresponding to the Latin-1 codec.

您可能想对Unicode和编码之间的差异以及与Python的关系进行一些研究:

You probably want to do some studying on the difference between Unicode and encodings, and how that relates to Python:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

实用Unicode

Python Unicode HOWTO

这篇关于Python UTF-8 Latin-1显示错误的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆