变量的Python字符串编码 [英] Python string encoding for a variable

查看:91
本文介绍了变量的Python字符串编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道对于Python<如图3所示,字符串植物大战僵尸-2的unicode编码如下:

  u植物大战僵尸。 Zombies‰?2 .encode( utf-8)

如果我有一个变量该怎么办(例如appName)而不是字符串,我可以这样:

  appName = Plant vs. Zombies䋢2  
u + appName.encode( utf-8)

适用于:

  appName = appName.encode('utf-8'); 


'ascii'编解码器无法解码位置18的字节0xe4:序数不在范围内(128)


解决方案

否。 u 表示法仅适用于字符串文字。包含字符串数据的变量不需要 u ,因为该变量包含的对象可以是Unicode字符串或字节字符串。 (我在这里假设 appName 包含字符串数据;如果不包含,则尝试对其进行编码是没有意义的。将其转换为字节字符串或unicode



所以您的变量包含一个unicode字符串或一个字节字符串。如果它是unicode字符串,则可以执行 appName.encode( utf-8)



如果它是一个字节字符串,则已经使用某种编码进行了编码。如果已将其编码为UTF-8,则它已经是您想要的方式,您无需执行任何操作。如果它采用其他编码,并且想将其转换为UTF-8,则可以执行 appName.decode('the-existing-encoding')。encode( utf-8)



请注意,如果您执行在编辑的问题中显示的内容,则结果可能不会达到您的期望。您具有:

  appName = Plant vs. Zombies䋢2 

在字符串文字上没有 u 的情况下,您已经以某种编码创建了字节字符串,即源文件的编码。如果您的源文件不在UTF-8中,那么您处于我上面描述的最后一种情况。在将其创建为非unicode之后,无法仅使字符串unicode。当您将其创建为非unicode时,您是以特定的编码创建它的,并且您必须知道哪种编码才能将其解码为unicode(因此,您可以根据需要将其编码为其他编码)。 / p>

I'm aware of the fact that for Python < 3, unicode encoding for the string 'Plants vs. Zombies䋢 2' is as below:

u"Plants vs. Zombies䋢 2".encode("utf-8")

What if I have an variable (say appName) instead of a string can I do it like this:

  appName = "Plants vs. Zombies䋢 2"
 u+appName.encode("utf-8")

For:

 appName = appName.encode('utf-8');


 'ascii' codec can't decode byte 0xe4 in position 18: ordinal not in range(128)

解决方案

No. The u notation is only for string literals. Variables containing string data don't need the u, because the variable contains an object that is either a unicode string or a byte string. (I'm assuming here that appName contains string data; if it doesn't, it doesn't make sense to try to encode it. Convert it to a bytestring or unicode first.)

So your variable either contains a unicode string or a byte string. If it is a unicode string you can just do appName.encode("utf-8").

If it is a byte string then it is already encoded with some encoding. If it's already encoded as UTF-8, then it's already how you want it and you don't need to do anything. If it's in some other encoding and you want to get it into UTF-8, you can do appName.decode('the-existing-encoding').encode("utf-8").

Note that if you do what you show in your edited, question, the result might not be what you expect. You have:

appName = "Plants vs. Zombies䋢 2"

Without the u on the string literal, you have created a bytestring in some encoding, namely the encoding of your source file. If your source file isn't in UTF-8, then you're in the last situation I described above. There is no way to "just make a string unicode" after you have created it as non-unicode. When you create it as non-unicode, you are creating it in a particular encoding, and you have to know what encoding that is in order to decode it to unicode (so you can then encode it to another encoding if you want).

这篇关于变量的Python字符串编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆