Python UTF-8转换问题 [英] Python UTF-8 conversion problem

查看:121
本文介绍了Python UTF-8转换问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的数据库中,我已经存储了一些UTF-8字符。例如。 name字段中的α



通过Django ORM,当我读到这个,我得到了一些像

 >>> p.name 
u'\xce\xb1'
>>>打印p.name
Î$

我希望'α'。 >

经过一番挖掘,我想如果我是

 >> ; a ='α'
>>>> a
'\xce\xb1'

所以当Python试图显示' \xce\xb1'我得到alpha,但是当它试图显示u'\xce\xb1'时,它是双重编码?



为什么我得到u'\xce\xb1'在第一位?有没有办法可以恢复'\xce\xb1'?



谢谢。我的UTF-8和unicode处理知识真的需要一些帮助...

解决方案

你似乎拥有的是UTF-8编码的字符串被解释为unicode码点。您可以通过以下方式解码您的字符串:

  p.name =''.join(chr(ord (x))for x in p.name 

或者可能

  p.name =''..inin(chr(ord(x))for p in p.name).decode('utf8')

将此字符串编码的一种方法是



$($($)$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ pre>

虽然我有一种感觉,你的系统实际上处于这种状态,系统的不同组件不同意使用中的编码。



您可能需要修复错误的编码的来源,而不是仅修复数据库中的数据。上面的代码可能会转换你的坏数据一次,但我建议你不要将这段代码插入你的Django应用程序。


In my database, I have stored some UTF-8 characters. E.g. 'α' in the "name" field

Via Django ORM, when I read this out, I get something like

>>> p.name
u'\xce\xb1'
>>> print p.name
α

I was hoping for 'α'.

After some digging, I think if I did

>>> a = 'α'
>>> a
'\xce\xb1'

So when Python is trying to display '\xce\xb1' I get alpha, but when it's trying to display u'\xce\xb1', it's double encoding?

Why did I get u'\xce\xb1' in the first place? Is there a way I can just get back '\xce\xb1'?

Thanks. My UTF-8 and unicode handling knowledge really need some help...

解决方案

What you seem to have is the individual bytes of a UTF-8 encoded string interpreted as unicode codepoints. You can "decode" your string out of this strange form with:

p.name = ''.join(chr(ord(x)) for x in p.name)

or perhaps

p.name = ''.join(chr(ord(x)) for x in p.name).decode('utf8')

One way to get your strings "encoded" into this form is

''.join(unichr(ord(x)) for x in '\xce\xb1')

although I have a feeling your strings actually got in this state by different components of your system disagreeing on the encoding in use.

You will probably have to fix the source of your bad "encoding" rather than just fixing the data currently in your database. And the code above might be okay to convert your bad data once, but I would advise you don't insert this code into your Django app.

这篇关于Python UTF-8转换问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆