删除非Unicode字符python [英] Deleting non unicode characters python

查看:85
本文介绍了删除非Unicode字符python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试返回一个请求,但它给我一个错误,指出字符串中包含非Unicode字符.我正在过滤掉它们,但随后它使字符串成为unicode样式,从而导致应用程序格式化响应错误而崩溃.

I am trying to return a request but it is giving me an error that there are non-unicode characters in the string. I am filtering them out but then it makes the string in unicode style which crashes the app with a badly formatted response.

这就是我想要做的

unfiltered_string = str({'location_id': location.pk, 'name': location.location_name,'address': location.address+', '+location.locality+', '+location.region+' '+location.postcode, 'distance': location.distance.mi, })
filtered_string = str(filter(lambda x: x in string.printable, unfiltered_string)).encode("utf-8")
locations.append(filtered_string)

麻烦的是它附加了一个类似于

The troubles is it appends a string that looks like

{'distance': 4.075068111513138, 'location_id': 1368, 'name': u'Stanford University', 'address': u'450 Serra Mall, Stanford, CA 94305'}

当我需要u'string'只是这样的'string'

when I need the u'string' to just be 'string' like this

{'distance': 4.075068111513138, 'location_id': 1368, 'name': 'Stanford University', 'address': '450 Serra Mall, Stanford, CA 94305'}

如果我尝试使用 string.encode('ascii','ignore'),那么我仍然会得到

if I try using string.encode('ascii','ignore') then I still get

"{'location_id': 1368, 'address': u'450 Serra Mall, Stanford, CA 94305', 'distance': 4.075068111513138, 'name': u'Stanford University'}"

现在我在json周围得到了额外的报价

and now I get extra quotations around the json

推荐答案

因此,我在这里走了个弯腰,说您的目标是忽略您拥有的Unicode特定字符.我认为要在问题中没有更好的解释就很难说出任何确定性的内容,但是如果您希望获取纯"字符串而不是unicode字符串,我建议使用 ascii 编解码器进行而不是 utf-8 .

So, I'm going to go out on a limb here and say that your goal here is to ignore the unicode specific characters that you've got. I think it's really difficult to say anything definitive without a better explanation in your question, but if you're looking to get a "plain" string instead of a unicode one I would suggest using the ascii codec for encoding instead of utf-8.

<str>.encode('ascii')

如果要删除其他字符, encode 函数采用可选的第二个参数,使您可以忽略指定编解码器无法处理的所有字符:

If you want to remove the other characters, the encode function takes an optional second argument allowing you to ignore all characters that the specified codec can't handle:

<str>.encode('ascii', 'ignore')

这篇关于删除非Unicode字符python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆