如何在python中将unicode类型与字符串进行比较? [英] How can I compare a unicode type to a string in python?

查看:553
本文介绍了如何在python中将unicode类型与字符串进行比较?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用列表理解来比较字符串对象,但是字符串之一是utf-8,它是json.loads的副产品.场景:

I am trying to use a list comprehension that compares string objects, but one of the strings is utf-8, the byproduct of json.loads. Scenario:

us = u'MyString' # is the utf-8 string

我的问题的一部分,为什么这会返回False? :

Part one of my question, is why does this return False? :

us.encode('utf-8') == "MyString" ## False

第二部分-如何在列表理解范围内进行比较?

Part two - how can I compare within a list comprehension?

myComp = [utfString for utfString in jsonLoadsObj
           if utfString.encode('utf-8') == "MyString"] #wrapped to read on S.O.

我使用的是Google App Engine,它使用的是Python 2.7

I'm using Google App Engine, which uses Python 2.7

这是问题的更完整示例:

Here's a more complete example of the problem:

#json coming from remote server:
#response object looks like:  {"number1":"first", "number2":"second"}

data = json.loads(response)
k = data.keys()

I need something like:
myList = [item for item in k if item=="number1"]  

#### I thought this would work:
myList = [item for item in k if item.encode('utf-8')=="number1"]

推荐答案

您必须遍历错误的数据集;只需直接在JSON加载的字典上循环即可,无需先调用.keys():

You must be looping over the wrong data set; just loop directly over the JSON-loaded dictionary, there is no need to call .keys() first:

data = json.loads(response)
myList = [item for item in data if item == "number1"]  

您可能希望使用u"number1"避免Unicode和字节字符串之间的隐式转换:

You may want to use u"number1" to avoid implicit conversions between Unicode and byte strings:

data = json.loads(response)
myList = [item for item in data if item == u"number1"]  

两个版本工作正常:

>>> import json
>>> data = json.loads('{"number1":"first", "number2":"second"}')
>>> [item for item in data if item == "number1"]
[u'number1']
>>> [item for item in data if item == u"number1"]
[u'number1']

请注意,在您的第一个示例中,us不是不是 UTF-8字符串;它是unicode数据,json库已经为您解码了.另一方面,UTF-8字符串是序列编码的字节.您可能需要阅读Unicode和Python来了解它们之间的区别:

Note that in your first example, us is not a UTF-8 string; it is unicode data, the json library has already decoded it for you. A UTF-8 string on the other hand, is a sequence encoded bytes. You may want to read up on Unicode and Python to understand the difference:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

Python Unicode HOWTO

实用Unicode

在Python 2上,您对测试返回True的期望是正确的,但您做错了其他事情:

On Python 2, your expectation that your test returns True would be correct, you are doing something else wrong:

>>> us = u'MyString'
>>> us
u'MyString'
>>> type(us)
<type 'unicode'>
>>> us.encode('utf8') == 'MyString'
True
>>> type(us.encode('utf8'))
<type 'str'>

无需将字符串编码为UTF-8进行比较;改用unicode文字:

There is no need to encode the strings to UTF-8 to make comparisons; use unicode literals instead:

myComp = [elem for elem in json_data if elem == u"MyString"]

这篇关于如何在python中将unicode类型与字符串进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆