如何在python中将unicode类型与字符串进行比较? [英] How can I compare a unicode type to a string in python?
问题描述
我正在尝试使用列表理解来比较字符串对象,但是字符串之一是utf-8,它是json.loads的副产品.场景:
I am trying to use a list comprehension that compares string objects, but one of the strings is utf-8, the byproduct of json.loads. Scenario:
us = u'MyString' # is the utf-8 string
我的问题的一部分,为什么这会返回False? :
Part one of my question, is why does this return False? :
us.encode('utf-8') == "MyString" ## False
第二部分-如何在列表理解范围内进行比较?
Part two - how can I compare within a list comprehension?
myComp = [utfString for utfString in jsonLoadsObj
if utfString.encode('utf-8') == "MyString"] #wrapped to read on S.O.
我使用的是Google App Engine,它使用的是Python 2.7
I'm using Google App Engine, which uses Python 2.7
这是问题的更完整示例:
Here's a more complete example of the problem:
#json coming from remote server:
#response object looks like: {"number1":"first", "number2":"second"}
data = json.loads(response)
k = data.keys()
I need something like:
myList = [item for item in k if item=="number1"]
#### I thought this would work:
myList = [item for item in k if item.encode('utf-8')=="number1"]
推荐答案
您必须遍历错误的数据集;只需直接在JSON加载的字典上循环即可,无需先调用.keys()
:
You must be looping over the wrong data set; just loop directly over the JSON-loaded dictionary, there is no need to call .keys()
first:
data = json.loads(response)
myList = [item for item in data if item == "number1"]
您可能希望使用u"number1"
避免Unicode和字节字符串之间的隐式转换:
You may want to use u"number1"
to avoid implicit conversions between Unicode and byte strings:
data = json.loads(response)
myList = [item for item in data if item == u"number1"]
两个版本工作正常:
>>> import json
>>> data = json.loads('{"number1":"first", "number2":"second"}')
>>> [item for item in data if item == "number1"]
[u'number1']
>>> [item for item in data if item == u"number1"]
[u'number1']
请注意,在您的第一个示例中,us
不是不是 UTF-8字符串;它是unicode数据,json
库已经为您解码了.另一方面,UTF-8字符串是序列编码的字节.您可能需要阅读Unicode和Python来了解它们之间的区别:
Note that in your first example, us
is not a UTF-8 string; it is unicode data, the json
library has already decoded it for you. A UTF-8 string on the other hand, is a sequence encoded bytes. You may want to read up on Unicode and Python to understand the difference:
-
每个软件开发人员绝对,肯定必须了解Unicode和字符集的绝对最低要求(无借口!)由Joel Spolsky
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
在Python 2上,您对测试返回True
的期望是正确的,但您做错了其他事情:
On Python 2, your expectation that your test returns True
would be correct, you are doing something else wrong:
>>> us = u'MyString'
>>> us
u'MyString'
>>> type(us)
<type 'unicode'>
>>> us.encode('utf8') == 'MyString'
True
>>> type(us.encode('utf8'))
<type 'str'>
无需将字符串编码为UTF-8进行比较;改用unicode文字:
There is no need to encode the strings to UTF-8 to make comparisons; use unicode literals instead:
myComp = [elem for elem in json_data if elem == u"MyString"]
这篇关于如何在python中将unicode类型与字符串进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!