Python和UTF-8:有点令人困惑 [英] Python and UTF-8: kind of confusing
问题描述
我在Python 2.5的谷歌应用程序引擎上。我的应用程序必须处理多语言,所以我必须处理utf-8。
我做了很多google,但没有得到我想要的。
1.
的用法是什么?< - c $ c># - * - coding:utf-8 - * -
2.
s = u'Witajświecie'
s = 'Witajświecie'
3.当我将.py文件保存为'utf-8'时,在每个字符串之前是否还需要 u
p>
u'blah'
将其变成另一种 / em>字符串(类型 unicode
而不是类型 str
) - 它使它成为一个unicode代码点序列。没有它,它是一系列字节。只有字节可写入磁盘或网络流,但您通常希望使用Unicode(尽管Python和某些库会为您执行一些转换) - 编码(utf-8)是这些。所以,是的,你应该在所有文字前面使用 u
,它会让你的生活变得更轻松。请参阅 Programatic Unicode 以获得更好的解释。
<编码行告诉Python你的文件是什么编码,以便Python可以理解它。再次,从磁盘读取字节 - 但Python希望看到字符。在Py2中,代码的默认编码是ASCII,因此编码行可以让您直接在.py文件中放置
ś
这样的东西 - 除此之外,它不会改变你的代码的工作方式。 I am on google app engine with Python 2.5. My application have to deal with multilanguages so I have to deal with utf-8.
I have done lots of google but dont get what I want.
1.Whats the usage of # -*- coding: utf-8 -*-
?
2.What is the difference between
s=u'Witaj świecie'
s='Witaj świecie'
'Witaj świecie' is a utf-8 string.
3.When I save the .py file to 'utf-8', do I still need the u
before every string?
u'blah'
turns it into a different kind of string (type unicode
rather than type str
) - it makes it a sequence of unicode codepoints. Without it, it is a sequence of bytes. Only bytes can be written to disk or to a network stream, but you generally want to work in Unicode (although Python, and some libraries, will do some of the conversion for you) - the encoding (utf-8) is the translation between these. So, yes, you should use the u
in front of all your literals, it will make your life much easier. See Programatic Unicode for a better explanation.
The coding line tells Python what encoding your file is in, so that Python can understand it. Again, reading from disk gives bytes - but Python wants to see the characters. In Py2, the default encoding for code is ASCII, so the coding line lets you put things like ś
directly in your .py file in the first place - other than that, it doesn't change how your code works.
这篇关于Python和UTF-8:有点令人困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!