Python编码问题 [英] Python Encoding Issue

查看:130
本文介绍了Python编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的在Python的所有编码/解码问题上迷失了。阅读了很少有关如何完美处理输入的文档,我仍然有几个语言的问题,如韩语。无论如何,这里是我正在做的。

  korean_text = korean_text.encode('utf-8','ignore') 
korean_text = unicode(korean_text,'utf-8')

我保存上述数据到数据库,这是很好的。



稍后,当我需要显示数据时,我从db获取内容,并执行以下操作:

  korean_text = korean_text.encode('utf-8')
打印korean_text

所有我看到的都是'???'在浏览器上回显。有人可以让我知道保存和显示上述数据的正确方法是什么。



谢谢

解决方案

即使阅读了一些文档,您似乎对unicode的工作感到困惑。




  • Unicode不是一个编码。 Unicode是没有编码的。

  • utf-8 不是unicode。 utf-8 是一种编码。

  • 解码 utf-8 bytestrings以获取unicode。您可以使用编码(例如utf-8)编码 unicode来获取编码的测试。

  • 只有通过测试可以保存到磁盘,数据库或发送网络,或打印在打印机或屏幕上。 Unicode只存在于您的代码中。



好的做法是尽可能早地解码所有的东西,并将其解码为unicode,在你的所有代码,然后编码它尽可能晚,当文本准备离开您的程序,屏幕,数据库或网络。






现在您的问题:



如果您有一个来自浏览器的文本,从表单中说,那么它被编码。这是一个毕业生。它不是unicode。



然后你必须解码才能获得unicode。使用编码的浏览器对其进行解码。正确的编码来自浏览器本身,正确的HTTP REQUEST标头。



解码时不要使用'ignore'。由于浏览器表示使用哪种编码,您不应该收到任何错误。使用'ignore'意味着你会隐藏一个错误,如果有一个。



也许你的网络框架已经有这样做我知道django,pylons,werkzeug,cherrypy都这样做。在这种情况下,您已经获得unicode。



现在,您有一个解码的unicode字符串,您可以使用您喜欢存储在数据库中的任何编码进行编码。 utf-8 是一个不错的选择,因为它可以对所有unicode代码点进行编码。



当您从数据库,使用您用于存储数据的相同编码进行解码。然后使用要在页面上使用的编码进行编码 - 在html元标题中声明的编码< meta http-equiv =Content-Typecontent =text / html; charset = UTF-8\" /> 。如果编码与上一步使用相同,则可以跳过decode / reencode,因为它已经被编码在utf-8中。



如果你看到 ??? 然后在上面的任何步骤丢失数据。要知道,需要更多的信息。


I am really lost in all the encoding/decoding issues with Python. Having read quite few docs about how to handle incoming perfectly, i still have issues with few languages, like Korean. Anyhow, here is the what i am doing.

korean_text = korean_text.encode('utf-8', 'ignore')
korean_text = unicode(korean_text, 'utf-8')

I save the above data to database, which goes through fine.

Later when i need to display data, i fetch content from db, and do the following:

korean_text = korean_text.encode( 'utf-8' )
print korean_text

And all i see is '???' echoed on the browser. Can someone please let me know what is the right way to save and display above data.

Thanks

解决方案

Even having read some docs, you seem to be confused on how unicode works.

  • Unicode is not an encoding. Unicode is the absence of encodings.
  • utf-8 is not unicode. utf-8 is an encoding.
  • You decode utf-8 bytestrings to get unicode. You encode unicode using an encoding, say, utf-8, to get an encoded bytestring.
  • Only bytestrings can be saved to disk, database, or sent on a network, or printed on a printer, or screen. Unicode only exists inside your code.

The good practice is to decode everything you get as early as possible, work with it decoded, as unicode, in all your code, and then encode it as late as possible, when the text is ready to leave your program, to screen, database or network.


Now for your problem:

If you have a text that came from the browser, say, from a form, then it is encoded. It is a bytestring. It is not unicode.

You must then decode it to get unicode. Decode it using the encoding the browser used to encode. The correct encoding comes from the browser itself, in the correct HTTP REQUEST header.

Don't use 'ignore' when decoding. Since the browser said which encoding it is using, you shouldn't get any errors. Using 'ignore' means you will hide a bug if there is one.

Perhaps your web framework of choice already does that. I know that django, pylons, werkzeug, cherrypy all do that. In that case you already get unicode.

Now that you have a decoded unicode string, you can encode it using whatever encoding you like to store on the database. utf-8 is a good choice, since it can encode all unicode codepoints.

When you retrieve the data from the database, decode it using the same encoding you used to store it. And then encode it using the encoding you want to use on the page - the one declared in the html meta header <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>. If the encoding is the same used on the previous step, you can skip the decode/reencode since it is already encoded in utf-8.

If you see ??? then the data is being lost on any step above. To know exactly, more information is needed.

这篇关于Python编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆