在Python中使用UTF-8 [英] Working with UTF-8 in Python

查看:196
本文介绍了在Python中使用UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于现在是夏天,所以我决定学习一种新语言,而Python是我的选择.确实,我想学习的是如何使用Python处理阿拉伯文字.现在,我发现了许多使用Python的资源,这些资源确实很棒.但是,当我将所学的知识应用到阿拉伯字符串时,我得到的数字和字母结合在一起.

As it is summer now, I decided to learn a new language and Python was my choice. Really, what I would like to learn is how to manipulate Arabic text using Python. Now, I have found many many resources on using Python, which are really great. However, when I apply what I learned on Arabic strings, I get numbers and letters combined together.

以英语为例

>>> ebook = 'The American English Dictionary'
>>> ebook[2]
'e'

现在,对于阿拉伯语:

>>> abook = 'القاموس العربي'
>>> abook[2]
'\xde'                  #the correct output should be 'ق'

但是,使用print可以正常工作,如下所示:

However, using print works fine, as in:

>>> print abook[2]
ق

我需要修改什么才能使Python始终识别阿拉伯字母?

What do I need to modify to get Python to always recognize Arabic letters?

推荐答案

显式使用Unicode:

Use Unicode explicitly:

>>> s = u'القاموس العربي'
>>> s
u'\u0627\u0644\u0642\u0627\u0645\u0648\u0633 \u0627\u0644\u0639\u0631\u0628\u064a'
>>> print s
القاموس العربي

>>> print s[2]
ق

甚至一个字符一个字符

>>> for i, c in enumerate(s):
...     print i,c
... 
0 ا
1 ل
2 ق
3 ا
4 م
5 و
6 س
7  
8 ا
9 ل
10 ع
11 ر
12 ب
13 ي
14 

我推荐 Python Unicode页面,该页面简短,实用且有用.

I recommend the Python Unicode page which is short, practical and useful.

这篇关于在Python中使用UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆