在Python中使用扩展的Ascii代码 [英] using extended Ascii codes with Python

查看:98
本文介绍了在Python中使用扩展的Ascii代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用Python创建了字典,但是扩展的Ascii代码遇到了问题.

I've created a dictionnary with Python but I've got problems with extended Ascii codes.

创建字典的循环为:(ASCII数字128至164:é,à等)

The loop that creats the dictionnary is : (ascii number 128 to 164 : é,à etc)

#extented ascii codes
i = 128
while i <= 165 :
    dictionnary[chr(i)] = 'extended ascii'
    i = i + 1

但是当我尝试使用字典时:

But when I try to use dictionnary :

    >>> dictionnary['è']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '\xc3\xa8'

我在python脚本的标头中有#--编码:utf-8--. 我尝试过编码,解码等,但是结果总是很糟糕.

I've got # -- coding: utf-8 -- in the header of the python script. I've tried encode,decode etc but the result is always bad.

要了解会发生什么,我已经尝试过:

To understand what happens, I've tried :

>>> ord('é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found

    >>> ord(u'é')
233

我对ord(u'é')感到困惑,因为在扩展的ascii表中'é'是数字130,而不是233.

I'am confused with ord(u'é') because 'é' is number 130 in extended ascii table and not 233.

我知道扩展的ASCII代码包含两个字符",但是我不知道如何用字典解决问题?

I understand that extended ascii codes contains "two characters" but I don't understand how to solve the problem with dictionnary ?

提前谢谢! :-)

推荐答案

使用unichr代替chr.函数chr生成一个包含单个字节的字符串,而unichr生成一个包含单个unicode字符的字符串.最后,也使用Unicode字符进行查找:d[u'é'],因为d['é']将查找é的utf-8编码.

Use unichr instead of chr. The function chr produces a string containing a single byte, whereas unichr produces a string containing a single unicode character. Finally, do lookups using unicode characters too: d[u'é'] because d['é'] will look up the utf-8 encoding of é.

您的代码中包含3项内容:一个latin-1编码的str,一个utf-8编码的str和一个unicode字符串.要清楚地知道在任何时间点都有什么,需要大量有关Python如何工作的知识以及对Unicode和编码的良好理解.

You have 3 things in your code: a latin-1 encoded str, a utf-8 encoded str, and a unicode string. Getting it clear in your head which you've got at any point in time requires a lot of knowledge about how Python works and a decent understanding of Unicode and encodings.

如果没有链接到Joel Spolsky有关此事的文章,那么关于编码和Unicode的答案是不完整的: The绝对最低限度每个软件开发人员绝对,肯定必须了解Unicode和字符集(无借口!)

No answer about encodings and Unicode is complete without a link to Joel Spolsky's article on the matter: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

这篇关于在Python中使用扩展的Ascii代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆